俞栋
csdn专访
谈到了声学模型方面,其关注Deep CNN和LFMMI(即povey的chain-model)。
提到了LFMMI是吸取了CTC优点(无force-alignment),仍基于传统HMM-DNN混合系统,进行的改进,性能不差于CTC,最主要的是训练稳定,CTC要大量调参,目前只有google和百度声称成功应用,即便成功,每个任务要大量调参并不是成熟的方法。
povey:
论坛topic链接
Firstly, CTC was never in the master branch of Kaldi. It's dropped permanently, because the 'chain' models were always better than CTC. And I removed the branch because I don't want to answer questions about it (and because it's a waste of their time too). BTW, a presentation by Google here at Interspeech is saying something similar, that a conventional model, discriminatively trained, with 1/3 the normal frame rate, beats CTC.
povey提到了interspeech上google的一个观点,interspeech应该有google这方面论文
百度
在搞深层CNN(6层据听说)和深层LSTM网络
CNN搞end-to-end的论文(wav2letter)
出门问问
听说很想搞CTC在嵌入式设备(手表、VR)的应用,我觉得CTC可能在这方面是其优势(模型大小、解码复杂度)
interspeech 2016
会议论文集 链接:http://pan.baidu.com/s/1pLB3w2v 密码:fww7
网友评论