美文网首页
iOS使用原生框架Speech Kit实现语音识别功能

iOS使用原生框架Speech Kit实现语音识别功能

作者: 智狸 | 来源:发表于2019-06-10 11:59 被阅读0次

    一、前言

    2016年Apple在发布重磅产品iOS10的同时也发布了Speech Kit语音识别框架,大名鼎鼎的Siri的语音识别就是基于Speech Kit实现的。有了Speech Kit,我们就可以非常简单地实现声音转文字的功能。下面我就简单介绍一下Speech Kit的用法。

    二、实现

    1、申请用户权限

    首先需要引入Speech Kit框架

    #import <Speech/Speech.h>

    申请权限非常简单,在识别前(viewDidAppear:)加入以下代码即可申请语音识别的权限:

    - (void)viewDidAppear:(BOOL)animated {

        [super viewDidAppear:animated];

        __weak typeof(self) wekself = self;

        [SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {

            dispatch_async(dispatch_get_main_queue(), ^{

                switch (status) {

                    case SFSpeechRecognizerAuthorizationStatusNotDetermined:

                        wekself.recordButton.enabled = NO;

                        [wekself.recordButton setTitle:@"语音识别未授权" forState:UIControlStateNormal];

                        break;

                    case SFSpeechRecognizerAuthorizationStatusDenied:

                        wekself.recordButton.enabled = NO;

                        [wekself.recordButton setTitle:@"用户未授权使用语音识别" forState:UIControlStateNormal];

                        break;

                    case SFSpeechRecognizerAuthorizationStatusRestricted:

                        wekself.recordButton.enabled = NO;

                        [wekself.recordButton setTitle:@"语音识别在这台设备上受到限制" forState:UIControlStateNormal];

                        break;

                    case SFSpeechRecognizerAuthorizationStatusAuthorized:

                        wekself.recordButton.enabled = YES;

                        [wekself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

                        break;

                    default:

                        break;

                }

            });

        }];

    }

    如果在运行起来会崩溃,原因是在iOS10后需要在info.plist文件中添加麦克分和语音识别权限申请信息:

    Privacy - Speech Recognition Usage Description 请允许语音识别

    Privacy - Microphone Usage Description 请打开麦克风

    运行项目,会提示打开语音识别和打开麦克风权限,至此我们已经完成了权限的申请。

    2、初始化语音识别引擎

    #pragma mark - property

    - (AVAudioEngine *)audioEngine {

        if (!_audioEngine) {

            _audioEngine = [[AVAudioEngine alloc] init];

        }

        return _audioEngine;

    }

    - (SFSpeechRecognizer *)speechRecognizer {

        if (!_speechRecognizer) {

            // 要为语音识别对象设置语音,这里设置的是中文

            NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];

            _speechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];

            _speechRecognizer.delegate = self;

        }

        return _speechRecognizer;

    }

    #pragma mark - SFSpeechRecognizerDelegate

    // 语音识别有效状态的回调

    - (void)speechRecognizer:(SFSpeechRecognizer *)speechRecognizer availabilityDidChange:(BOOL)available {

        if (available) {

            self.recordButton.enabled = YES;

            [self.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

        } else {

            self.recordButton.enabled = NO;

            [self.recordButton setTitle:@"语音识别不可用" forState:UIControlStateNormal];

        }

    }

    1.初始化SFSpeechRecognizer时需要传入一个NSLocle对象,用于标识用户输入的语种,如"zh-CN"代表普通话,"en_US"代表英文。

    2.AVAudioEngine是音频引擎,用于音频输入。

    3、启动语音识别引擎

    添加以下代码:

    - (IBAction)recordButtonClicked {

        if ([self.audioEngine isRunning]) {

            [self endRecording];

            [self.recordButton setTitle:@"正在停止" forState:UIControlStateDisabled];

        } else {

            [self startRecoding];

            [self.recordButton setTitle:@"停止录音" forState:UIControlStateNormal];

        }

    }

    - (IBAction)startRecoding {

        if (_recognitionTask) {

            [_recognitionTask cancel];

            _recognitionTask = nil;

        }

        AVAudioSession *audioSession = [AVAudioSession sharedInstance];

        NSError *error = nil;

        [audioSession setCategory:AVAudioSessionCategoryRecord error:&error];

        NSParameterAssert(!error);

        [audioSession setMode:AVAudioSessionModeMeasurement error:&error];

        NSParameterAssert(!error);

        [audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&error];

        NSParameterAssert(!error);

        _recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

        AVAudioInputNode *inputNode = [self.audioEngine inputNode];

        NSAssert(inputNode, @"录入设备没有准备好");

        NSAssert(_recognitionRequest, @"请求初始化失败");

        _recognitionRequest.shouldReportPartialResults = YES;

        __weak typeof(self) wekself = self;

        _recognitionTask = [self.speechRecognizer recognitionTaskWithRequest:_recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {

            __strong typeof(wekself) strongself = wekself;

            BOOL isFinal = NO;

            if (result) {

                NSLog(@"%@", result.bestTranscription.formattedString);

                strongself.resultStringLabel.text = result.bestTranscription.formattedString;

                isFinal = result.isFinal;

            }

            if (error || isFinal) {

                [wekself.audioEngine stop];

                [inputNode removeTapOnBus:0];

                strongself.recognitionTask = nil;

                strongself.recognitionRequest = nil;

                strongself.recordButton.enabled = YES;

                [strongself.recordButton setTitle:@"开始录音" forState:UIControlStateNormal];

            }

        }];

        AVAudioFormat *recordingFormat = [inputNode outputFormatForBus:0];

        // 在添加tap之前先移除上一个 不然可能报错

        [inputNode removeTapOnBus:0];

        [inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {

            __strong typeof(wekself) strongself = wekself;

            if (strongself.recognitionRequest) {

                [strongself.recognitionRequest appendAudioPCMBuffer:buffer];

            }

        }];

        [self.audioEngine prepare];

        [self.audioEngine startAndReturnError:&error];

        NSParameterAssert(!error);

        self.resultStringLabel.text = LoadingText;

    }

    1.利用AVAudioSession对象进行音频录制的配置。

    2.在语音识别产生最终结果之前可能产生多种结果,设置SFSpeechAudioBufferRecognitionRequest对象的shouldReportPartialResult属性为YES意味着每产生一种结果就马上返回。

    3.设置音频录制的格式及音频流回调的处理(把音频流拼接到self.recognitionRequest)。

    4.为self.recordButton添加点击事件。

    5.开始录制音频。

    6.修改按钮文案。

    4、重置语音识别引擎

    添加以下代码:

    - (void)endRecording {

        [self.audioEngine stop];

        if (_recognitionRequest) {

            [_recognitionRequest endAudio];

        }

        if (_recognitionTask) {

            [_recognitionTask cancel];

            _recognitionTask = nil;

        }

        self.recordButton.enabled = NO;

        if ([self.resultStringLabel.text isEqualToString:LoadingText]) {

            self.resultStringLabel.text = @"";

        }

    }

    1.为self.recordButton添加禁用点击事件。

    2.停止音频录制引擎。

    3.停止识别器。

    4.修改按钮文案。

    5、语音识别结果的回调

    下面是语音识别器SFSpeechRecognizer的API描述:

    // Recognize speech utterance with a request

    // If request.shouldReportPartialResults is true, result handler will be called

    // repeatedly with partial results, then finally with a final result or an error.

    - (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request

                                              resultHandler:(void (^)(SFSpeechRecognitionResult * __nullable result, NSError * __nullable error))resultHandler;

    // Advanced API: Recognize a custom request with with a delegate

    // The delegate will be weakly referenced by the returned task

    - (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request

                                                  delegate:(id <SFSpeechRecognitionTaskDelegate>)delegate;

    语音识别结果的回调有两种方式,一种是delegate,一种是block,这里为了简单,先采用block的方式回调。

    6、识别音频文件

    添加以下代码

    /**

    识别本地音频文件

    */

    - (IBAction)recognizeLocalAudioFile {

        NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];

        SFSpeechRecognizer *localRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];

        NSURL *url = [[NSBundle mainBundle] URLForResource:@"录音.m4a" withExtension:nil];

        if (!url) return;

        SFSpeechURLRecognitionRequest *res = [[SFSpeechURLRecognitionRequest alloc] initWithURL:url];

        __weak typeof(self) wekself = self;

        [localRecognizer recognitionTaskWithRequest:res resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {

            if (error) {

                NSString *errMsg = [NSString stringWithFormat:@"语音识别解析失败, %@", error];

                [BaseViewController hudWithTitle:errMsg];

                NSLog(@"%@", errMsg);

            } else {

                wekself.resultStringLabel.text = result.bestTranscription.formattedString;

            }

        }];

    }

    1.初始化语音识别器SFSpeechRecognizer。

    2.获取音频文件路径。

    3.初始化语音识别请求SFSpeechURLRecognitionRequest。

    4.设置回调。

    三、总结

    本文章主要介绍了如何利用iOS系统自带的Speech Kit框架实现音频转文字的功能,Speech Kit相当强大,本文章只是非常简单的介绍了录音识别及音频文件识别而已,大家有兴趣可以深入研究,有问题也可以一起探讨。

    Demo地址:https://github.com/jayZhangh/PhotosFrameworkBasicUsage.git

    四、参考

    https://swift.gg/2016/09/30/siri-speech-framework/

    https://developer.apple.com/videos/play/wwdc2016/509/

    https://www.raywenderlich.com/2422-building-an-ios-app-like-siri

    相关文章

      网友评论

          本文标题:iOS使用原生框架Speech Kit实现语音识别功能

          本文链接:https://www.haomeiwen.com/subject/sjzaxctx.html