1. 安装
git clone https://github.com/kaldi-asr/kaldi.git
cd kaldi/tools/; make; cd ../src; ./configure; make
data:image/s3,"s3://crabby-images/6a24c/6a24c994881a8e85da458dcb5759e8b9530243b5" alt=""
一大堆东西没装,给跪了
安装automake和autoconf
brew install wget
2. 主要目录结构
egs – 完整流程例子脚本,
misc – 辅助工具,
src – kaldi源代码,
tools – 有用的模块和工具,
windows – windows用工具.
3. pipeline
3.1 数据准备
lexicon: 文字->声音 AAW -> ey ey d ah b y uw
G.txt (bigram decoding graph, in OpenFst text format)
声音和人的映射关系: utt2spk/spk2utt
symbol-tables for words and phones (OpenFst format)
data:image/s3,"s3://crabby-images/cb85d/cb85d3147d9bb6cb4702e47a4ac786ee91e0ce31" alt=""
创建binary-format FST,用整数代替字符
data/G.fst(语法), data/L.fst(lexicon), data/L_disambig.fst
data:image/s3,"s3://crabby-images/d9e76/d9e76bf0818d9cb06e18eba08b314ebbaae9e7e1" alt=""
3.2 特征提取
“ark”==archive, “scp”==script file
# head /foo/raw_mfcc.scp
trn_adg04_sr009 /foo/raw_mfcc.ark:16
trn_adg04_sr049 /foo/raw_mfcc.ark:23395
...
# head –c 20 foo/raw_mfcc.ark
trn_adg04_sr009 ^@BFM [binary data...]
data:image/s3,"s3://crabby-images/070cd/070cdd8d5f583b5511914007e9fa8daaa2f23995" alt=""
data:image/s3,"s3://crabby-images/0b2b9/0b2b9d41de19145739c587be2f3177242a19b2d1" alt=""
三种Table处理
TableWriter 按顺序写
SequentialTableReader 遍历读
RandomAccessTableReader 按key查指定value
为啥没有随机写?没这个场景,算都是批量算,遍历写就可以了
data:image/s3,"s3://crabby-images/17d9f/17d9fb2e009bae7220ea698d58029640d7fefe0a" alt=""
data:image/s3,"s3://crabby-images/b6474/b6474019a7759048fd6abd4b9a559f14bda4ec4d" alt=""
3.3 单音训练
- 设置变量
- 创建topo文件
- 初始化模型(flat start of the model/ 没有split的决策树)
-
创建解码图
decoding graph
-
对齐初始化
alignment
-
单音训练
Monophone training
- 多音训练
- 多音训练创建树
- 解码:创建图
训练部分有点复杂,看看其他资料,回头再补充
更多参考
网友评论