kaldi中遇到的bug

作者: 氢离子游离 | 来源:发表于2018-11-27 11:22 被阅读0次

kaldi中遇到的bug
开发中遇到的bug
工作中遇到的bug
开发工程中遇到的BUG
ios开发中遇到的bug
[kaldi] Kaldi与Pytorch
MAC Anaconda创建python2环境安装kaldi
遇到的bug
Mac os下运行Kaldi中文例子(thchs30，清华大学3
BUG篇

kaldi的gpu配置

CUDA will not be used! If you have already installed cuda drivers and cuda toolkit, try using --cudatk-dir=... option. Note: this is only relevant for neural net experiments

解析：如果已经安装cuda，按照提示做即可。
./configure --cudatk-dir=CUDA toolkit所在目录 --shared

cmd.sh设置

queue.pl: Error submitting jobs to queue (return status was 256)
解析：kaldi默认设置是集群跑
1.本地跑：把cmd.sh中所有queue.pl改为run.pl
2.集群跑：需要正确设置机器的名称

环境配置问题

utils/prepare_lang.sh: line 502: fstaddselfloops: command not found
ERROR: FstHeader::Read: Bad FST header: standard input

解析：如果kaldi编译过程中没有出现问题，那就是openfst的路径没有添加到egs/s5/path.sh中。添加即可。

IRSTLM

INSTALLATION of IRSLTM finished successfully.
please source the tools/extras/env.sh in your path.sh to enable it.

解析：IRSTLM是做语言模型用的。同样因为IRSTLM是手动下载，需要将tools/extras/env.sh里的内容拷贝到egs/s5/path.sh下面。

chain-tdnn报错

解析：如果跑原始的run.sh也报错的话num-jobs-initial和num-jobs-final可以设置小一些，不能超过集群GPU数目
参考链接：tdnn-chain训练出错

cuda问题

解析：在训练神经网络的时候，出现了报错，在提示的log日志中找到原因是因为gpu的问题。日志中错误情况查看：找ERROR

迭代中报错

解析：多半是机器的问题。
use-gpu=wait

train-stage可以改为从报错的迭代次数开始

copy-feats

-bash: copy-feats: command not found
解析：kaldi的配置有问题，把相关路径source到s5/path.sh下。

数据准备问题

steps/make_mfcc_pitch.sh --pitch-config conf/pitch.conf --cmd queue.pl --mem 2G --nj 20 data/train exp/make_mfcc/train mfcc
utils/validate_text.pl: The line for utterance IC0007W0001 contains CR (0x0D) character
utils/validate_text.pl: ERROR: text file 'data/train/text' contains disallowed UTF-8 whitespace character(s)
解析：text里文件中包含不合法的空格（全角和半角）

align报错

queue.pl: 1 / 20 failed, log is in exp_new/mono/log/align.1.*.log

wc -l ./*查看每个log的长度
然后选择最小的打开，查看ERROR/error
You provided the "cs" option but are not calling with keys in sorted order
解析
顺序不对，在对应的.scp下输入——:sort
mono align的过程中没有报错但是暂停了
解析
可以尝试更改stage继续运行

内存不够

(nnet3-chain-train[5.5.123~2-d5bd]:AllocateNewRegion():cu-allocator.cc:519) Failed to allocate a memory region of 1992294400 bytes. Possibly this is due to sharing the GPU. Try switching the GPUs to exclusive mode (nvidia-smi -c 3) and using the option --use-gpu=wait to scripts like steps/nnet3/chain/train.py. Memory info: free:3798M, used:239M, total:4037M, free/total:0.940803