hang: nnetbin/sat-nnet-train-frm

作者: 雨月梵雨时鸢 | 来源:发表于2020-08-24 16:11 被阅读0次

hang: nnetbin/sat-nnet-train-frm
英语学习终点站
二维数组排序2022-05-28
Hang in there
hang in there
Hang in there
Hang in There
显
《hang me,oh hang me》
无标题文章

主体训练过程

循环条件

Done()函数：data_end_ - data_begin_ < conf_.minibatch_size,即不够填充一个mini_batch数据
Next()函数：data_begin_ += conf_.minibatch_size
Value()函数：
a. 判断有一个minibatch的数据( data_end_ - data_begin_ > minibatch_size),其实该条和Done是重复的.
b. 读取一个minibatch数据，注意读的是引用.

前向传递(Propagate_SpkCode函数)

前向传递根据不同的component的处理不同
std::vector<Component*> components_;
components_[0]为Speakercode层;components_[1]~components_[size-1]为常规结构
GetSpkInfo(in),in为每一帧所对应说话人的id,current_code为一个临时CuMatrix变量，大小为in.size*code_length_;speaker_code_中存放的是所有人的code;利用CopyRows按in的索引拷贝.
current_code_id_ = in，注意这里current_code_id_是CuArray类型的成员变量，用于后续反向传播.
connect_nnet_为一个仅有一个component的网络，进行前向传递，结果存于code_out_，为CuMatrix的成员变量，用于与各个层加和，改变偏执.
Nnet的Propagate():propagate_buf_[0]存放输入;对每一个component调用Propagate;输出存于propagate_buf_i+1.输出为out=propagate_buf_[component_.size()].
将connect_out_diff_清空,重置为0,用于存储梯度,大小为in.size*connet_nnet_.OutputDim().
由于component_[0]为speakercode层，所以正常输入存于propagate_buf_[1].

进行前向

for(int32 i=1; i < (int32)components_.size(); i++) {
  components_[i]->Propagate(propagate_buf_[i], &propagate_buf_[i+1]);
  sat_layer->PropagateFnc(&propagate_buf_[i+1], i);
}

循环从i=1到size-1.由于0为speakercode层.举个例子.

0=speakercode;1=affine;2=sigmoid;3=affine;4=softmax.此时component_.size()=5
循环的components为1，2，3，4.最终的结果存放在propagate_buf_[5]中.

sat_layer->PropagateFnc(CuMatrix<BaseFloat> *out, int layer_id)为这样
定义：std::map<int32, std::pair<int32, int32 > > adapt_layers_;

layer_id为proto文件中写好的,adapt_layers_.find(layer_id) != adapt_layers_.end(),从adapt_layers_中查找。
这里adapt_layers_=[1,3].当i处在一个adapt_layers_时，将偏执加上去.
out->AddMat(1.0, code_out_.ColRange(adapt_layers_[layer_id].first, adapt_layers_[layer_id].second));
第一个值为起始位置，第二个值为宽度.(其实位置为之前adapt_layers宽度加和即可)

post转换为CuMatrix形式

typedef std::vector<std::vector<std::pair<int32,BaseFloat>>> Posterior;其中第一个vector中的每个对象为一帧数据;第二个vector长度通常为1;存一个pair对;int32为transition-id;BaseFloat为概率.
之所以设计成vector<vector<pair>>的形式,而不是vector<pair>,目的在于支持多状态后验概率的形式.但很少见，通常第二个vector长度为1.

PosteriorToMatrix函数

void PosteriorToMatrix(const Posterior &post, int32 num_cols, CuMatrix<Real> *mat) {
// Make a host-matrix,
int32 num_rows = post.size();
Matrix<Real> m(num_rows, num_cols, kSetZero); // zero-filled
// Fill from Posterior,
for (int32 t = 0; t < post.size(); t++) {
  for (int32 i = 0; i < post[t].size(); i++) { 
    int32 col = post[t][i].first;
    if (col >= num_cols) {
      KALDI_ERR << "Out-of-bound Posterior element with index " << col
      << ", higher than number of columns " << num_cols;
    } 
    m(t, col) = post[t][i].second;
  }     
}       
// Copy to output GPU matrix,
(*mat) = m;
}

post.size()为帧数,新矩阵的大小=帧数*目标状态数;核心为m(t,col)= post[t][i].second;最后返回CuMatrix形式.这样nnet_tgt中为每帧对应target矩阵形式(通常每行为one-hot形式).

计算误差

xent.Eval(frm_weights, nnet_out, nnet_tgt, &obj_diff);梯度会被frm_weights缩放
该函数第一部分：计算diff=目标函数对输出求偏导数=(net_out-target)
该函数第二部分：计算loss+=cross_entropy=求和（-tlogy）；entropy=求和（-tlogt）；每1h的数据报告一次(loss-entropy/frames)

反向传播调整

如果不是crossvalidate是进行
传入参数为obj_diff，也就是(net_out-target)
根据参数update_codeonly分为两种更新

主体更新函数Backpropagate_SpkCode(obj_diff,NULL)

dynamic_case<SpeakerCode *>(components_[0]);
将梯队out_diff赋值给backpropagate_buf_[NumComponents()],即最后一个.
循环从NumComponent()-1开始(即最后一个component),到component[1]结束，因为component[0]为sat_component.

循环体内

sat_layer->BackpropagateFnc(backpropagate_buf_[i+1], i);
如果i恰好在adapt_layers_中(adapt_layers_通常为线性层),存储backpropagate_[i+1]
的梯度到connect_out_diff_的相应列中,准备进行传递得到speakercode的梯度.
  components_[i]->Backpropagate(propagate_buf_[i],propagate_buf_[i+1],
                            backpropagate_buf_[i+1],&backpropagate_buf_[i]);
每一层传递前的梯度存于backpropagate_buf_[i+1],传递后存于backpropagate_buf_[i],而前向的buf在某些component中会被利用.

更新speakercode层:sat_layer->Update(propagate_buf_[0], backpropagate_buf_[1]);实际上参数并没有用，更新的梯度存在connect_out_diff中;用于更新speakercode的梯度存于speaker_code_diff_cur_中.connect_nnet_.Backpropagate在获得speaker_code的梯度同时，更新connect_out_网络(实际为一个)
current_code_id_中存放的是该minibatch的所有对应speaker_id;赋值给spks后进行sort和unique,注意unique实际是去掉相邻的重复.然后spks.resize为[std::distance(spks.begin(),it)],相当于无重复长度.

处理冲量

for (int32 i = 0; i < spks.size(); i++) {
  speaker_code_corr_.Row(current_code_id_[i]).Scale(mmt);
}
这样事有问题的做冲量运算。此处需要修改
假设该minibatch对应的current_code_id_为[1,2,3,2,4,3],这样得到的spks为[1,2,3,4]。
所以i取0~3.current_code_id_[0~3]为[1,2,3,2]，这样编号为4的speaker并没有处理以前累计的梯度。
如果经常出现类似code_id_顺序,会导致某些speaker的梯度持续累加，以前很多轮的minibatch的梯度被反复累加。

为每个说话人累加梯度

for (int32 i = 0; i < current_code_id_.size(); i++) {
  speaker_code_corr_.Row(current_code_id_[i]).AddVec(1.0,
  speaker_code_diff_cur_.Row(i));
}
其中speaker_code_diff_cur_.Row(i)相当于取出第i帧的梯度
current_code_id_[i]相当于取第i帧对应的说话人id
speaker_code_corr_大小=人数*长度，用来存储累加梯度。
speaker_code_corr_.Row(current_code_id_[i])相当于取出第i帧所对应说话人的累计总梯度

调整speaker_code_

for (int32 i = 0; i < spks.size(); i++) {
  speaker_code_.Row(spks[i]).AddVec(-lr, speaker_code_corr_.Row(spks[i]));;
}
spks[i]为取出第i个人的id.speaker_code_.Row(spks[i])为取出第i个人的code值
AddVec(-lr, speaker_code_corr_.Row(spks[i]))与该人的累计梯度相加

打印log信息和Write新模型

网友评论

本文标题：hang: nnetbin/sat-nnet-train-frm

本文链接：https://www.haomeiwen.com/subject/peccdttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

hang: nnetbin/sat-nnet-train-frm

主体训练过程

相关文章

hang: nnetbin/sat-nnet-train-frm

英语学习终点站

二维数组排序2022-05-28

Hang in there

hang in there

Hang in there

Hang in There

显

《hang me,oh hang me》

无标题文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读