音频格式转换
- Convertio 非常强大的文件转换器,免费额度有限,会员收费较高
- ALL TO ALL 完全免费,但不能设定采样率
- aconvert.com 免费
修改音频采样率
用sox命令将audio.wav的采样率修改为16000Hz
$ sox audio.wav -r 16000 audio.wav
参考文章:SoX — 音频处理工具里的瑞士军刀
Sox命令行录制音频
创建文本文件arctic20.txt,写下你要录制的句子,用换行分隔,执行以下命令开始录制,终端会按顺序显示句子,读完一句后,按Common+C切换下一条句子:
for i in `seq 1 20`; do
fn=`printf arctic_%04d $i`;
read sent;
echo $sent;
rec -r 16000 -e signed-integer -b 16 -c 1 $fn.wav 2>/dev/null;
done < arctic20.txt
验证音频文件格式
采样率和输入音频的通道数不匹配,或者输入音频带宽不匹配。它必须是16 kHz(或8 kHz,具体取决于训练数据),16位单声道(=单通道)Little-Endian文件。您需要通过重新采样来固定源的采样率(仅当其采样率高于训练数据的采样率时)。您不应该对文件进行升采样并使用以较高采样率对音频进行训练的声学模型对其进行解码。
可以使用以下命令验证音频文件格式(采样率,通道数):
$ sox --i /path/to/audio/file
例如:
$ sox --i /Users/.../xxx.wav
Input File : '/Users/.../xxx.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:04.58 = 73226 samples ~ 343.247 CDDA sectors
File Size : 146k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
计算单词错误率(WER)
创建一个fileids文件test.fileids:
test1
test2
创建一个转录文件test.transcription:
some text (test1)
some text (test2)
将音频文件放在wav文件夹中。确保这些文件具有正确的格式和采样率。
└─ wav
├─ test1.wav
└─ test2.wav
运行解码器:
pocketsphinx_batch \
-adcin yes \
-cepdir wav \
-cepext .wav \
-ctl test.fileids \
-lm `<your.lm>` \ # for example en-us.lm.bin from pocketsphinx
-dict `<your.dic>` \ # for example cmudict-en-us.dict from pocketsphinx
-hmm `<your_hmm>` \ # for example en-us
-hyp test.hyp
生成test.hyp
文件。
执行sphinxtrain的脚本word_align.pl
(我的安装路径在/usr/local/lib/sphinxtrain/scripts/decode/word_align.pl
):
/usr/local/lib/sphinxtrain/scripts/decode/word_align.pl test.transcription test.hyp
这里吐槽一下,官方教程说执行word_align.pl test.transcription test.hyp
根本跑不通,这把word_align.pl
整个路径贴出来才可以。
然后输出以下内容:
some text (test1)
some text (test1)
Words: 1 Correct: 1 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
Insertions: 0 Deletions: 0 Substitutions: 0
some text (test2)
some text (test2)
Words: 1 Correct: 1 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
Insertions: 0 Deletions: 0 Substitutions: 0
TOTAL Words: 2 Correct: 2 Errors: 0
TOTAL Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
TOTAL Insertions: 0 Deletions: 0 Substitutions: 0
参考文章:调整语音识别精度(官方文档)
网友评论