美文网首页
语音识别

语音识别

作者: lzhenboy | 来源:发表于2020-08-06 15:32 被阅读0次

本示例相关代码见github:
https://github.com/lzhenboy/Speech2Text.git

1、视频抽取语音

目标:从视频文件中抽取语音
工具:ffmpeg
安装

brew install ffmpeg

示例

ffmpeg -i video-demo.mp4 -f wav -ar 16000 audio-demo.wav

参数解释

-i video-demo.mp4 # 输入文件路径
-f wav # 输出语音文件格式为wav
-ar 16000 # 采样率为16000
speech-demo.wav # 输出文件路径

2、语音识别

目标:从语音中识别文本
开源实现SpeechRecognition
安装

pip install SpeechRecognition

示例

# -*- coding: utf-8 -*-
# @Time    : 2020/7/30 3:43 下午
# @Author  : lzhenboy

import os
import time
import json
import wave
# import eyed3
import jieba
import jieba.analyse
import datetime
from collections import OrderedDict

import speech_recognition as sr
from pydub import AudioSegment


class Speech2Text(object):
    def __init__(self):
        # 音频文件分割时间间隔,默认30s
        self.time_interval = 30
        self.recognizer = sr.Recognizer()

    def split_speech(self, speech_file, split_speech_dir):
        # 获取音频时长
        with wave.open(speech_file, 'rb') as fin:
            time_length = int(fin.getparams()[3] / fin.getparams()[2])
            print('File `{}.wav` speech duration: {}'.format(speech_base_name, time_length))

        # 音频分割输出
        os.makedirs(split_speech_dir, exist_ok=True)
        audio = AudioSegment.from_wav(speech_file)
        kn = int(time_length / self.time_interval) + 1
        for i in range(kn):
            audio[i * self.time_interval * 1000:((i + 1) * self.time_interval + 2) * 1000].export(
                os.path.join(split_speech_dir, speech_file.replace('.wav', '-{}.wav'.format(i))), format='wav')

    def convert_speech_to_text(self, split_speech_dir, speech_text_dir):
        os.makedirs(speech_text_dir, exist_ok=True)

        # 获取文件夹下的音频文件名
        start_time = datetime.datetime.now()
        speech_text = {}
        for idx, name in enumerate(os.listdir(split_speech_dir)):
            print("%d %s 开始转换" % (idx, name))
            # 音频分块识别
            try:
                with sr.WavFile(os.path.join(split_speech_dir, name)) as source:
                    audio = self.recognizer.record(source)
                    # Google的接口效果最好
                    text = r.recognize_google(audio, language='zh-CN')
                    time.sleep(5)
                    temp_time = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
                    print('%s %d %s 已完成' % (temp_time, idx, name))
                    speech_text[idx] = text
            except Exception as e:
                print(e)
                temp_time = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
                print('%s %d %s 未完成' % (temp_time, idx, name))
                speech_text[idx] = ''
                continue
        end_time = datetime.datetime.now()
        print('总共花费时间:%s' % (end_time - start_time))
        with open(os.path.join(speech_text_dir, 'speech_text.json'), 'w', encoding='utf-8') as fout:
            json.dump(speech_text, fout, indent=4, ensure_ascii=False)


if __name__ == '__main__':
    speech_demo_file = './data/audio-demo.wav'
    split_speech_demo_dir = './data/split_speech_file'
    speech_text_demo_dir = './data'

    st = Speech2Text()

    # Step1
    st.split_speech(speech_demo_file, split_speech_demo_dir)

    # Step2
    st.convert_speech_to_text(split_speech_dir, speech_text_demo_dir)

效果评估
对语音连续的音频文件,google的语音识别接口识别效果较好,但对于中间有长时间停顿的音频文件,语音识别效果一般,往往会将后半部分漏掉。

参考文献

1、使用Python进行语音识别---将音频转为文字
2、Python使用Speech_Recognition实现普通话识别
3、ffmpeg 从视频中提取WAV格式的音频

相关文章

网友评论

      本文标题:语音识别

      本文链接:https://www.haomeiwen.com/subject/kenjrktx.html