monophone
目的:
建立所有音素的HMM模型,用于解码检索。每个音素有3个状态,构建这个庞大的结构并存储其中的信息。解码时得到一帧语音特征向量,检索它是哪个pdf→状态→音素,才能确定它是哪个音素。
方法:
transition-model.h
transition-model.cc
kaldi中的HMM模型实际是一个TransitionModel对象。
Transitionmodel在transition-model.h中定义,在transition-model.cc中实现。
kaldi中每个音素的HMM拓扑结构:
Conception
transition state : (虚拟的)状态,通过弧跳转到自己或其他状态。
transition-index:HMM中状态转移的索引,即HMMTopology::HmmState::transitions.
从0开始编号
transition-id:所有HMM的弧编号,从0开始编号(global)
phone:音素,从1开始编号(global)
HMM-state:音素HMM模型的状态(3个state/音素)。从0开始编号。(local)
pdf-id:高斯模型。(local)
下面举个栗子
show-transitions phones.txt final.mdl
Transition-state 1: phone = sil hmm-state = 0 pdf = 0
Transition-id = 1 p = 0.728349 [self-loop]
Transition-id = 2 p = 0.27165 [0 -> 1]
Transition-state 2: phone = sil hmm-state = 1 pdf = 56
Transition-id = 3 p = 0.809842 [self-loop]
Transition-id = 4 p = 0.190158 [1 -> 2]
Transition-state 3: phone = sil hmm-state = 2 pdf = 48
Transition-id = 5 p = 0.475911 [self-loop]
Transition-id = 6 p = 0.524089 [2 -> 3]
可以看出映射关系
(phone, HMM-state, forward-pdf-id, self-loop-pdf-id) -> transition-state
(transition-state, transition-index) -> transition-id
tips
- phone-state和pdf不是一一对应的关系,一个pdf可能对应多个phone。
If two states have the same pdf_class variable, then they will always share the same probability distribution function (p.d.f.)
2.一门语言的拓扑结构是人为设计的,参数(HMM的转移概率和pdf的均值和方差)则通过EM迭代算法得到
参考博客:
https://blog.csdn.net/u013677156/article/details/79136418
https://blog.csdn.net/u012361418/article/details/73506448
(本文是对两篇博文的综合理解)
网友评论