一、前言
2016年Apple在发布重磅产品iOS10的同时也发布了Speech Kit语音识别框架,大名鼎鼎的Siri的语音识别就是基于Speech Kit实现的。有了Speech Kit,我们就可以非常简单地实现声音转文字的功能。下面我就简单介绍一下Speech Kit的用法。
二、实现
1、申请用户权限
首先需要引入Speech Kit框架
#import <Speech/Speech.h>
申请权限非常简单,在识别前(viewDidAppear:)加入以下代码即可申请语音识别的权限:
- (void)viewDidAppear:(BOOL)animated {
[super viewDidAppear:animated];
__weak typeof(self) wekself = self;
[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {
dispatch_async(dispatch_get_main_queue(), ^{
switch (status) {
case SFSpeechRecognizerAuthorizationStatusNotDetermined:
wekself.recordButton.enabled = NO;
[wekself.recordButton setTitle:@"语音识别未授权" forState:UIControlStateNormal];
break;
case SFSpeechRecognizerAuthorizationStatusDenied:
wekself.recordButton.enabled = NO;
[wekself.recordButton setTitle:@"用户未授权使用语音识别" forState:UIControlStateNormal];
break;
case SFSpeechRecognizerAuthorizationStatusRestricted:
wekself.recordButton.enabled = NO;
[wekself.recordButton setTitle:@"语音识别在这台设备上受到限制" forState:UIControlStateNormal];
break;
case SFSpeechRecognizerAuthorizationStatusAuthorized:
wekself.recordButton.enabled = YES;
[wekself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];
break;
default:
break;
}
});
}];
}
如果在运行起来会崩溃,原因是在iOS10后需要在info.plist文件中添加麦克分和语音识别权限申请信息:
Privacy - Speech Recognition Usage Description 请允许语音识别
Privacy - Microphone Usage Description 请打开麦克风
运行项目,会提示打开语音识别和打开麦克风权限,至此我们已经完成了权限的申请。
2、初始化语音识别引擎
#pragma mark - property
- (AVAudioEngine *)audioEngine {
if (!_audioEngine) {
_audioEngine = [[AVAudioEngine alloc] init];
}
return _audioEngine;
}
- (SFSpeechRecognizer *)speechRecognizer {
if (!_speechRecognizer) {
// 要为语音识别对象设置语音,这里设置的是中文
NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];
_speechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];
_speechRecognizer.delegate = self;
}
return _speechRecognizer;
}
#pragma mark - SFSpeechRecognizerDelegate
// 语音识别有效状态的回调
- (void)speechRecognizer:(SFSpeechRecognizer *)speechRecognizer availabilityDidChange:(BOOL)available {
if (available) {
self.recordButton.enabled = YES;
[self.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];
} else {
self.recordButton.enabled = NO;
[self.recordButton setTitle:@"语音识别不可用" forState:UIControlStateNormal];
}
}
1.初始化SFSpeechRecognizer时需要传入一个NSLocle对象,用于标识用户输入的语种,如"zh-CN"代表普通话,"en_US"代表英文。
2.AVAudioEngine是音频引擎,用于音频输入。
3、启动语音识别引擎
添加以下代码:
- (IBAction)recordButtonClicked {
if ([self.audioEngine isRunning]) {
[self endRecording];
[self.recordButton setTitle:@"正在停止" forState:UIControlStateDisabled];
} else {
[self startRecoding];
[self.recordButton setTitle:@"停止录音" forState:UIControlStateNormal];
}
}
- (IBAction)startRecoding {
if (_recognitionTask) {
[_recognitionTask cancel];
_recognitionTask = nil;
}
AVAudioSession *audioSession = [AVAudioSession sharedInstance];
NSError *error = nil;
[audioSession setCategory:AVAudioSessionCategoryRecord error:&error];
NSParameterAssert(!error);
[audioSession setMode:AVAudioSessionModeMeasurement error:&error];
NSParameterAssert(!error);
[audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error];
NSParameterAssert(!error);
_recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
AVAudioInputNode *inputNode = [self.audioEngine inputNode];
NSAssert(inputNode, @"录入设备没有准备好");
NSAssert(_recognitionRequest, @"请求初始化失败");
_recognitionRequest.shouldReportPartialResults = YES;
__weak typeof(self) wekself = self;
_recognitionTask = [self.speechRecognizer recognitionTaskWithRequest:_recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {
__strong typeof(wekself) strongself = wekself;
BOOL isFinal = NO;
if (result) {
NSLog(@"%@", result.bestTranscription.formattedString);
strongself.resultStringLabel.text = result.bestTranscription.formattedString;
isFinal = result.isFinal;
}
if (error || isFinal) {
[wekself.audioEngine stop];
[inputNode removeTapOnBus:0];
strongself.recognitionTask = nil;
strongself.recognitionRequest = nil;
strongself.recordButton.enabled = YES;
[strongself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];
}
}];
AVAudioFormat *recordingFormat = [inputNode outputFormatForBus:0];
// 在添加tap之前先移除上一个 不然可能报错
[inputNode removeTapOnBus:0];
[inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
__strong typeof(wekself) strongself = wekself;
if (strongself.recognitionRequest) {
[strongself.recognitionRequest appendAudioPCMBuffer:buffer];
}
}];
[self.audioEngine prepare];
[self.audioEngine startAndReturnError:&error];
NSParameterAssert(!error);
self.resultStringLabel.text = LoadingText;
}
1.利用AVAudioSession对象进行音频录制的配置。
2.在语音识别产生最终结果之前可能产生多种结果,设置SFSpeechAudioBufferRecognitionRequest对象的shouldReportPartialResult属性为YES意味着每产生一种结果就马上返回。
3.设置音频录制的格式及音频流回调的处理(把音频流拼接到self.recognitionRequest)。
4.为self.recordButton添加点击事件。
5.开始录制音频。
6.修改按钮文案。
4、重置语音识别引擎
添加以下代码:
- (void)endRecording {
[self.audioEngine stop];
if (_recognitionRequest) {
[_recognitionRequest endAudio];
}
if (_recognitionTask) {
[_recognitionTask cancel];
_recognitionTask = nil;
}
self.recordButton.enabled = NO;
if ([self.resultStringLabel.text isEqualToString:LoadingText]) {
self.resultStringLabel.text = @"";
}
}
1.为self.recordButton添加禁用点击事件。
2.停止音频录制引擎。
3.停止识别器。
4.修改按钮文案。
5、语音识别结果的回调
下面是语音识别器SFSpeechRecognizer的API描述:
// Recognize speech utterance with a request
// If request.shouldReportPartialResults is true, result handler will be called
// repeatedly with partial results, then finally with a final result or an error.
- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request
resultHandler:(void (^)(SFSpeechRecognitionResult * __nullable result, NSError * __nullable error))resultHandler;
// Advanced API: Recognize a custom request with with a delegate
// The delegate will be weakly referenced by the returned task
- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request
delegate:(id <SFSpeechRecognitionTaskDelegate>)delegate;
语音识别结果的回调有两种方式,一种是delegate,一种是block,这里为了简单,先采用block的方式回调。
6、识别音频文件
添加以下代码
/**
识别本地音频文件
*/
- (IBAction)recognizeLocalAudioFile {
NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];
SFSpeechRecognizer *localRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];
NSURL *url = [[NSBundle mainBundle] URLForResource:@"录音.m4a" withExtension:nil];
if (!url) return;
SFSpeechURLRecognitionRequest *res = [[SFSpeechURLRecognitionRequest alloc] initWithURL:url];
__weak typeof(self) wekself = self;
[localRecognizer recognitionTaskWithRequest:res resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {
if (error) {
NSString *errMsg = [NSString stringWithFormat:@"语音识别解析失败, %@", error];
[BaseViewController hudWithTitle:errMsg];
NSLog(@"%@", errMsg);
} else {
wekself.resultStringLabel.text = result.bestTranscription.formattedString;
}
}];
}
1.初始化语音识别器SFSpeechRecognizer。
2.获取音频文件路径。
3.初始化语音识别请求SFSpeechURLRecognitionRequest。
4.设置回调。
三、总结
本文章主要介绍了如何利用iOS系统自带的Speech Kit框架实现音频转文字的功能,Speech Kit相当强大,本文章只是非常简单的介绍了录音识别及音频文件识别而已,大家有兴趣可以深入研究,有问题也可以一起探讨。
Demo地址:https://github.com/jayZhangh/PhotosFrameworkBasicUsage.git
四、参考
https://swift.gg/2016/09/30/siri-speech-framework/
https://developer.apple.com/videos/play/wwdc2016/509/
https://www.raywenderlich.com/2422-building-an-ios-app-like-siri
网友评论