BERT创建模型部分解读

作者: 陶_306c | 来源:发表于2021-03-25 10:42 被阅读0次

BERT创建模型部分解读
Bert模型解读-2
BERT微调模型
BERT代码解读(2)-模型
Bert使用随笔
NER----BERT-NER-conll03结果
NLP必读 | 十分钟读懂谷歌BERT模型
transformers中的bert用法
加载训练好的BERT参数
论文解读：知识图谱融入预训练模型

取自https://blog.csdn.net/weixin_42001089/article/details/97657149中的一部分，博主讲的很全面，很细致，我是看过一遍代码，再来看代码解读，就很容易理解。

bert模型是谷歌2018年10月底公布的，它的提出主要是针对word2vec等模型的不足，说白了BERT 模型是将预训练模型和下游任务模型结合在一起的，核心目的就是：是把下游具体NLP任务的活逐渐移到预训练产生词向量上。

create_model

$\color{red}{重中之重}$
$\color{red}{重中之重}$
$\color{red}{重中之重}$

这里可以说整个Bert使用的 $\color{red}{最关键}$ 的地方，我们使用Bert大多数情况无非进行在定义自己的下游工作进行fine-tune，就是在这里定义的

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
                 labels, num_labels, use_one_hot_embeddings):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)
  # In the demo, we are doing a simple classification task on the entire
  # segment.
  #
  # If you want to use the token-level output, use model.get_sequence_output()
  # instead.
  output_layer = model.get_pooled_output()
 
  hidden_size = output_layer.shape[-1].value
 
  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))
 
  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())
 
  with tf.variable_scope("loss"):
    if is_training:
      # I.e., 0.1 dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
 
    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    log_probs = tf.nn.log_softmax(logits, axis=-1)
 
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
 
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)
 
    return (loss, per_example_loss, logits, probabilities)

首先调用modeling.BertModel得到bert模型

bert模型的输入：`input_ids，input_mask，segment_ids`

model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)

config是bert的配置文件，在开头下载的中文模型中里面有，直接加载即可

use_one_hot_embeddings是根据是不是用GPU而定的，其他字段上述都说过啦

bert模型的输出：

其有两种情况

model.get_sequence_output()
model.get_pooled_output()
第一种输出结果是[batch_size, seq_length, embedding_size]

第二种输出结果是[batch_size, embedding_size]

第二种结果是第一种结果在第二个维度上面进行了池化，要是形象点比喻的话，第一种结果得到是tokens级别的结果，第二种是句子级别的，其实就是一个池化

$\color{red}{我们定义部分}$

这部分就是需要我们根据自己的任务自己具体定义啦，假设是一个简单的分类，那么就是定义一个全连接层将其转化为[batch_size, num_classes]对吧

output_weights和output_bias就是对应全连接成的权值，后面就是loss,使用了tf.nn.log_softmax应该是一个多分类，多标签的话可以使用tf.nn.sigmoid，比较简单就不再说啦

总得来说，使用bert进行自己任务的时候，可以千变万化，变的就是这里这个下游

初始化

tf.truncated_normal_initializer的意思是：从截断的正态分布中输出随机值。
生成的值服从具有指定平均值和标准偏差的正态分布，如果生成的值大于平均值2个标准偏差的值则丢弃重新选择。

ARGS：
mean：一个python标量或一个标量张量。要生成的随机值的均值。
stddev：一个python标量或一个标量张量。要生成的随机值的标准偏差。
seed：一个Python整数。用于创建随机种子。查看 tf.set_random_seed 行为。
dtype：数据类型。只支持浮点类型。

// 这是神经网络权重和过滤器的推荐初始值。

$\color{red}{总结：}$
一：总体来说，在进行具体工作时，需要改的核心就是：

1）继承DataProcessor定义一个自己的数据预处理类

2）在create_model中定义自己的具体下游工作

剩下的就是一些零零碎碎的小地方啦，也很简单

二：关于bert上游的具体模型定义这里没有，实在感兴趣可以看modeling.py脚本，优化器部分是optimization.py

三：这里没有从头训练bert模型，因为耗时耗力，没有资源一般来说很难，关于预训练的部分是run_pretraining.py

网友评论

本文标题：BERT创建模型部分解读

本文链接：https://www.haomeiwen.com/subject/qtllhltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

BERT创建模型部分解读

bert模型的输入：`input_ids，input_mask，segment_ids`

bert模型的输出：

$\color{red}{我们定义部分}$

初始化

相关文章