iOS使用原生框架Speech Kit实现语音识别功能

作者: 智狸 | 来源:发表于2019-06-10 11:59 被阅读0次

iOS使用原生框架Speech Kit实现语音识别功能
iOS利用Speech Kit实现语音识别
iOS利用Speech Kit实现语音识别
iOS10新特性－sessions 509 语音识别－Speec
使用Speech Framework实现语音转文字
Web Speech API 实现语音合成和语音识别
Speech Framework
同时使用百度asr和tts sdk时，无法唤醒
iOS 语音转文字(系统api)
opencore框架进行语音解码amr文件转化wav播放

一、前言

2016年Apple在发布重磅产品iOS10的同时也发布了Speech Kit语音识别框架，大名鼎鼎的Siri的语音识别就是基于Speech Kit实现的。有了Speech Kit，我们就可以非常简单地实现声音转文字的功能。下面我就简单介绍一下Speech Kit的用法。

二、实现

1、申请用户权限

首先需要引入Speech Kit框架

#import <Speech/Speech.h>

申请权限非常简单，在识别前（viewDidAppear:）加入以下代码即可申请语音识别的权限：

- (void)viewDidAppear:(BOOL)animated {

[super viewDidAppear:animated];

__weak typeof(self) wekself = self;

[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {

dispatch_async(dispatch_get_main_queue(), ^{

switch (status) {

case SFSpeechRecognizerAuthorizationStatusNotDetermined:

wekself.recordButton.enabled = NO;

[wekself.recordButton setTitle:@"语音识别未授权" forState:UIControlStateNormal];

break;

case SFSpeechRecognizerAuthorizationStatusDenied:

wekself.recordButton.enabled = NO;

[wekself.recordButton setTitle:@"用户未授权使用语音识别" forState:UIControlStateNormal];

break;

case SFSpeechRecognizerAuthorizationStatusRestricted:

wekself.recordButton.enabled = NO;

[wekself.recordButton setTitle:@"语音识别在这台设备上受到限制" forState:UIControlStateNormal];

break;

case SFSpeechRecognizerAuthorizationStatusAuthorized:

wekself.recordButton.enabled = YES;

[wekself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

break;

default:

break;

}

});

}];

}

如果在运行起来会崩溃，原因是在iOS10后需要在info.plist文件中添加麦克分和语音识别权限申请信息：

Privacy - Speech Recognition Usage Description 请允许语音识别

Privacy - Microphone Usage Description 请打开麦克风

运行项目，会提示打开语音识别和打开麦克风权限，至此我们已经完成了权限的申请。

2、初始化语音识别引擎

#pragma mark - property

- (AVAudioEngine *)audioEngine {

if (!_audioEngine) {

_audioEngine = [[AVAudioEngine alloc] init];

}

return _audioEngine;

}

- (SFSpeechRecognizer *)speechRecognizer {

if (!_speechRecognizer) {

// 要为语音识别对象设置语音，这里设置的是中文

NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];

_speechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];

_speechRecognizer.delegate = self;

}

return _speechRecognizer;

}

#pragma mark - SFSpeechRecognizerDelegate

// 语音识别有效状态的回调

- (void)speechRecognizer:(SFSpeechRecognizer *)speechRecognizer availabilityDidChange:(BOOL)available {

if (available) {

self.recordButton.enabled = YES;

[self.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

} else {

self.recordButton.enabled = NO;

[self.recordButton setTitle:@"语音识别不可用" forState:UIControlStateNormal];

}

}

1.初始化SFSpeechRecognizer时需要传入一个NSLocle对象，用于标识用户输入的语种，如"zh-CN"代表普通话，"en_US"代表英文。

2.AVAudioEngine是音频引擎，用于音频输入。

3、启动语音识别引擎

添加以下代码：

- (IBAction)recordButtonClicked {

if ([self.audioEngine isRunning]) {

[self endRecording];

[self.recordButton setTitle:@"正在停止" forState:UIControlStateDisabled];

} else {

[self startRecoding];

[self.recordButton setTitle:@"停止录音" forState:UIControlStateNormal];

}

}

- (IBAction)startRecoding {

if (_recognitionTask) {

[_recognitionTask cancel];

_recognitionTask = nil;

}

AVAudioSession *audioSession = [AVAudioSession sharedInstance];

NSError *error = nil;

[audioSession setCategory:AVAudioSessionCategoryRecord error:&error];

NSParameterAssert(!error);

[audioSession setMode:AVAudioSessionModeMeasurement error:&error];

NSParameterAssert(!error);

[audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error];

NSParameterAssert(!error);

_recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

AVAudioInputNode *inputNode = [self.audioEngine inputNode];

NSAssert(inputNode, @"录入设备没有准备好");

NSAssert(_recognitionRequest, @"请求初始化失败");

_recognitionRequest.shouldReportPartialResults = YES;

__weak typeof(self) wekself = self;

_recognitionTask = [self.speechRecognizer recognitionTaskWithRequest:_recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {

__strong typeof(wekself) strongself = wekself;

BOOL isFinal = NO;

if (result) {

NSLog(@"%@", result.bestTranscription.formattedString);

strongself.resultStringLabel.text = result.bestTranscription.formattedString;

isFinal = result.isFinal;

}

if (error || isFinal) {

[wekself.audioEngine stop];

[inputNode removeTapOnBus:0];

strongself.recognitionTask = nil;

strongself.recognitionRequest = nil;

strongself.recordButton.enabled = YES;

[strongself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

}

}];

AVAudioFormat *recordingFormat = [inputNode outputFormatForBus:0];

// 在添加tap之前先移除上一个不然可能报错

[inputNode removeTapOnBus:0];

[inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {

__strong typeof(wekself) strongself = wekself;

if (strongself.recognitionRequest) {

[strongself.recognitionRequest appendAudioPCMBuffer:buffer];

}

}];

[self.audioEngine prepare];

[self.audioEngine startAndReturnError:&error];

NSParameterAssert(!error);

self.resultStringLabel.text = LoadingText;

}

1.利用AVAudioSession对象进行音频录制的配置。

2.在语音识别产生最终结果之前可能产生多种结果，设置SFSpeechAudioBufferRecognitionRequest对象的shouldReportPartialResult属性为YES意味着每产生一种结果就马上返回。

3.设置音频录制的格式及音频流回调的处理(把音频流拼接到self.recognitionRequest)。

4.为self.recordButton添加点击事件。

5.开始录制音频。

6.修改按钮文案。

4、重置语音识别引擎

添加以下代码：

- (void)endRecording {

[self.audioEngine stop];

if (_recognitionRequest) {

[_recognitionRequest endAudio];

}

if (_recognitionTask) {

[_recognitionTask cancel];

_recognitionTask = nil;

}

self.recordButton.enabled = NO;

if ([self.resultStringLabel.text isEqualToString:LoadingText]) {

self.resultStringLabel.text = @"";

}

}

1.为self.recordButton添加禁用点击事件。

2.停止音频录制引擎。

3.停止识别器。

4.修改按钮文案。

5、语音识别结果的回调

下面是语音识别器SFSpeechRecognizer的API描述：

// Recognize speech utterance with a request

// If request.shouldReportPartialResults is true, result handler will be called

// repeatedly with partial results, then finally with a final result or an error.

- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request

resultHandler:(void (^)(SFSpeechRecognitionResult * __nullable result, NSError * __nullable error))resultHandler;

// Advanced API: Recognize a custom request with with a delegate

// The delegate will be weakly referenced by the returned task

- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request

delegate:(id <SFSpeechRecognitionTaskDelegate>)delegate;

语音识别结果的回调有两种方式，一种是delegate，一种是block，这里为了简单，先采用block的方式回调。

6、识别音频文件

添加以下代码

/**

识别本地音频文件

*/

- (IBAction)recognizeLocalAudioFile {

NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];

SFSpeechRecognizer *localRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];

NSURL *url = [[NSBundle mainBundle] URLForResource:@"录音.m4a" withExtension:nil];

if (!url) return;

SFSpeechURLRecognitionRequest *res = [[SFSpeechURLRecognitionRequest alloc] initWithURL:url];

__weak typeof(self) wekself = self;

[localRecognizer recognitionTaskWithRequest:res resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {

if (error) {

NSString *errMsg = [NSString stringWithFormat:@"语音识别解析失败, %@", error];

[BaseViewController hudWithTitle:errMsg];

NSLog(@"%@", errMsg);

} else {

wekself.resultStringLabel.text = result.bestTranscription.formattedString;

}

}];

}

1.初始化语音识别器SFSpeechRecognizer。

2.获取音频文件路径。

3.初始化语音识别请求SFSpeechURLRecognitionRequest。

4.设置回调。

三、总结

本文章主要介绍了如何利用iOS系统自带的Speech Kit框架实现音频转文字的功能，Speech Kit相当强大，本文章只是非常简单的介绍了录音识别及音频文件识别而已，大家有兴趣可以深入研究，有问题也可以一起探讨。

Demo地址：https://github.com/jayZhangh/PhotosFrameworkBasicUsage.git

四、参考

https://swift.gg/2016/09/30/siri-speech-framework/

https://developer.apple.com/videos/play/wwdc2016/509/

https://www.raywenderlich.com/2422-building-an-ios-app-like-siri

网友评论

本文标题：iOS使用原生框架Speech Kit实现语音识别功能

本文链接：https://www.haomeiwen.com/subject/sjzaxctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！