iOS10 Speech Recognition语音识别API

作者: Mad_Mark | 来源:发表于2016-10-26 15:59 被阅读617次

Web Speech API 实现语音合成和语音识别
iOS10 Speech Recognition语音识别API
iOS10 Speech Recognition语音识别API
语音识别中英文术语
语音识别之--音频编解码
TransWAI：高效实现语音转文字，减少视频翻译周期
阅读“百度语音文档”
语音识别（ASR）--语音转文字
语音产品设计-学习（一）
读书笔记-Designing voice user interf

SpeechRecognition简介

iOS10中的公开的新API ：Speech Recognition可以用于识别用户的语音，我们可以根据识别结果来实现一些我们想要的操作。
网上搜罗了下相关资料不多，本人参考了一些国外的网站，自己写了个DEMO，在这做个简单分享：

功能授权

现在iOS10对系统功能的使用都需要进行一次用户授权，所以我们就像设置相机一样，在info.plist文件中也要添加相关的使用描述，语音识别功能需要用到两个系统功能：
NSSpeechRecognitionUsageDescription: 语音识别使用描述
NSMicrophoneUsageDescription:麦克风使用描述
所以我们添加：

<key>NSSpeechRecognitionUsageDescription</key> 
<string>Speech Recognition</string> 
<key>NSMicrophoneUsageDescription</key> 
<string>Microphone</string>

这里的string即描述会在提示用户的时候显示。

图：

Paste_Image.png

基础设置

功能很简单，点击按钮，开始听写，在label上显示识别出的内容

@property (nonatomic, strong) AVAudioEngine *audioEngine;                           // 声音处理器
@property (nonatomic, strong) SFSpeechRecognizer *speechRecognizer;                 // 语音识别器
@property (nonatomic, strong) SFSpeechAudioBufferRecognitionRequest *speechRequest; // 语音请求对象
@property (nonatomic, strong) SFSpeechRecognitionTask *currentSpeechTask;           // 当前语音识别进程
@property (nonatomic, weak) IBOutlet UILabel *showLb;       // 用于展现的label
@property (nonatomic, weak) IBOutlet UIButton *startBtn;    // 启动按钮

在viewDidLoad中初始化，并判断用户授权是否通过

// 初始化
self.audioEngine = [AVAudioEngine new];
// 这里需要先设置一个AVAudioEngine和一个语音识别的请求对象SFSpeechAudioBufferRecognitionRequest
self.speechRecognizer = [SFSpeechRecognizer new];
self.startBtn.enabled = NO;

[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status)
{
    if (status != SFSpeechRecognizerAuthorizationStatusAuthorized)
    {
        // 如果状态不是已授权则return
        return;
    }
    
    // 初始化语音处理器的输入模式
    [self.audioEngine.inputNode installTapOnBus:0 bufferSize:1024
                                         format:[self.audioEngine.inputNode outputFormatForBus:0]
                                          block:^(AVAudioPCMBuffer * _Nonnull buffer,
                                                  AVAudioTime * _Nonnull when)
    {
        // 为语音识别请求对象添加一个AudioPCMBuffer，来获取声音数据
        [self.speechRequest appendAudioPCMBuffer:buffer];
    }];
    // 语音处理器准备就绪（会为一些audioEngine启动时所必须的资源开辟内存）
    [self.audioEngine prepare];
    
    self.startBtn.enabled = YES;
}];

注意: 如果你在info.plist文件中设置NSMicrophoneUsageDescription失败，这时如果尝试访问_audioEngine.InputNode会使你的app崩溃，且你无法catch到有用的信息。

实现功能

点击按钮

- (IBAction)onStartBtnClicked:(id)sender
{
 if (self.currentSpeechTask.state == SFSpeechRecognitionTaskStateRunning)
   {   // 如果当前进程状态是进行中
    
      [self.startBtn setTitle:@"Start Dictating" forState:UIControlStateNormal];
      // 停止语音识别
      [self stopDictating];
  }
   else
    {   // 进程状态不在进行中
     [self.startBtn setTitle:@"Stop Dictaring" forState:UIControlStateNormal];
     self.showLb.text = @"I'm waiting";
        // 开启语音识别
     [self startDictating];
    }
}

- (void)startDictating
{
   NSError *error;
  // 启动声音处理器
   [self.audioEngine startAndReturnError: &error];
   // 初始化
   self.speechRequest = [SFSpeechAudioBufferRecognitionRequest new];

    // 使用speechRequest请求进行识别
  self.currentSpeechTask =
  [self.speechRecognizer recognitionTaskWithRequest:self.speechRequest
                                    resultHandler:^(SFSpeechRecognitionResult * _Nullable result,
                                                    NSError * _Nullable error)
    {
        // 识别结果，识别后的操作
        if (result == NULL) return;
        self.showLb.text = result.bestTranscription.formattedString;
    }];
}

在这个方法中我们创建了一个新的识别请求和语音进程。当通过识别对象更新数据的时候，则更新label的text，无论听写是否仍然在进行中。
最后我们只需要实现stopDictating：

- (void)stopDictating
{
    // 停止声音处理器，停止语音识别请求进程
     [self.audioEngine stop];
     [self.speechRequest endAudio];
}

好了，代码很少，很多东西也在注释中写明了，现在已经可以实现听写的功能了。这时如果我们对识别的结果再进行一次判断，根据不同的结果来执行不同的操作，应该会有不错的用户体验吧。

参考：http://gregshackles.com/using-speech-recognition-in-ios-10/?utm_source=tuicool&utm_medium=referral

网友评论

Cherry_06:可以识别简体中文吗？
摘心:有没有办法拿到输入音量？
浮浅丶Superficial:有 Demo 吗?
Mad_Mark:@浮浅丶Superficial 我写的demo集成在一个我的工具集里，所以没有上传。代码其实就这些，无非自己再弄下UI。

本文标题：iOS10 Speech Recognition语音识别API

本文链接：https://www.haomeiwen.com/subject/huoduttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！