android录音并通过百度识别将语音转文字遇到的坑

作者: 大胡子的机器人 | 来源:发表于2019-09-14 20:30 被阅读0次

android录音并通过百度识别将语音转文字遇到的坑
浮云识音是什么？
解决使用 AVAudioRecorder 录音保存 .WAV 文
文字转语音&录音转文字隐私政策
录音转文字助手如何使用？
百度Ocr文字识别
Android Bluetooth相关操作
2018-08-09
办公软件 | 录音啦
2018-08-04

由于百度语音识别的语音文件格式必须固定且符合：

百度语音识别规则：原始 PCM 的录音参数必须符合 16k 采样率、16bit 位深、单声道，支持的格式有：pcm（不压缩）、wav（不压缩，pcm编码）、amr（压缩格式）。

因此，我尝试了3种方式（最终实现可行的方法在最后一个方法中）：

一、通过 Ffmpeg 将录音文件amr转wav格式

在32位android系统没有问题，但是我们定制的android系统是64位，
Ffmpeg 的32位so库克参考这里下载（别人编译好的so库）：
https://github.com/huangjingqiang/ffmpeg-library

然后通过以下类进行转换：

package cn.dxjia.ffmpeg.library;

import android.util.Log;

/**
 * 警告：该文件不可修改，包括包名和类名
 * 参考：https://github.com/huangjingqiang/ffmpeg-library
 */
public class FFmpegNativeHelper {
public FFmpegNativeHelper() {
}

static {
    System.loadLibrary("avutil-54");
    System.loadLibrary("swresample-1");
    System.loadLibrary("avcodec-56");
    System.loadLibrary("avformat-56");
    System.loadLibrary("swscale-3");
    System.loadLibrary("avfilter-5");
    System.loadLibrary("avdevice-56");
    System.loadLibrary("ffmpegjni");

}

public static String runCommand(String command) {
    if(command == null || command.length() == 0) {
        return "Command can`t be empty.";
    }
    String[] args = command.split(" ");
    for(int i = 0; i < args.length; i++) {
        Log.d("ffmpeg-jni", args[i]);
    }
    try {
        return ffmpeg_run(args);
    } catch (Exception e) {
        throw e;
    }
}

private static native int ffmpeg_init();
private static native int ffmpeg_uninit();
private static native String ffmpeg_run(String[] args);
}

//使用方法：
//final String command = "ffmpeg -i " + source + " -vn -acodec pcm_s16le -ab 256k -ac 1 -ar 16000 -f wav -y " + target;
//FFmpegNativeHelper.runCommand(command)

因此，想通过Ffmpeg编译64位的so库文件，编译方法参考：
https://blog.csdn.net/bobcat_kay/article/details/88843778
方法太过复杂，而且文档中说坑比较多，为了高效率，放弃该方法。

二、详查百度语音文档，尝试REST/sdk集成的方法

百度语音文档：https://ai.baidu.com/docs#/ASR-Online-Java-SDK/top

通过将amr文件通过json格式上传，但百度语音返回format错误，该错误一直误导我以为是format格式错误，反而被一直在找参数问题，更改header中的参数，更改format参数，但一直无效，最终想到了第三个方法，可能可以从录音文件源头下手

image.png

三、通过改写录音格式，从而解决问题

通过以下录音启动代码可以得到PCM编码、16k 采样率、16bit 位深的amr文件

this.muteAudioFocus(this.mAudioManager, true);
        this.mAudioManager.setMode(0);
        this.mMediaRecorder = new MediaRecorder();

        try {
            int bps = 7950;
            this.mMediaRecorder.setAudioSamplingRate(16000);  //设置录制的音频采样率
            this.mMediaRecorder.setAudioEncodingBitRate(bps);  //音频编码比特率
        } catch (Resources.NotFoundException var3) {
            var3.printStackTrace();
        }

        ////百度语音强调的是单声道，但我尝试单声道参数AudioFormat.CHANNEL_IN_MONO反而是录制启动就失败，报参数错误
        this.mMediaRecorder.setAudioChannels(AudioFormat.CHANNEL_IN_DEFAULT);
        this.mMediaRecorder.setAudioSource(MediaRecorder.AudioSource.MIC);
        this.mMediaRecorder.setOutputFormat(MediaRecorder.OutputFormat.AMR_WB);//注意AMR_NB是8000采样率
        this.mMediaRecorder.setAudioEncoder(AudioFormat.ENCODING_PCM_16BIT);
        this.mAudioPath = Uri.fromFile(new File(SAVE_PATH, System.currentTimeMillis() + ".voice"));
        this.mMediaRecorder.setOutputFile(this.mAudioPath.getPath());
        this.mMediaRecorder.prepare();
        this.mMediaRecorder.start();