(一)nn.CrossEntropyLoss详解
刚入门神经网络的时候也没注意过cross entropy的具体流程是怎么样的。以为和MSE一样,类似于计算预测值和label之间的距离。笼统地以为就是计算两个分布之间的距离。一直没有仔细探讨过其中的计算过程。
问题起因:在做序列模型的时候发现,序列预测出来的向量的形状是这样的(batch_size, seq_long, vocab_size), label的形状是(batch_size, seq_long)。如果要计算两者的交叉熵的话需要将预测值permute成(batch_size, vocab_size, seq_long).这里我一直没搞懂为啥要这样变化。
(1)计算公式
![](https://img.haomeiwen.com/i12824314/5c444d0ed042edb9.png)
(2)代码实现
- 经过softmax计算概率值
- 经过log函数取对数,因为输入为(0,1),所以经过log后就会变成(-∞,0),取绝对值后,取值范围又变成了(0,+∞)
- 计算NLLLoss损失
import torch.nn as nn
import torch
x = torch.randn((2,3,4))
y = torch.tensor([[0,1,3],[1,0,1]]) # 形状(2,3)
# (tensor([[[-0.3704, -1.8745, 0.5677, 0.4539],
# [-0.8580, 0.3837, 0.2307, 1.8132],
# [-0.3387, -0.8566, 0.2709, 0.5942]],
# [[ 0.1688, -0.9780, -0.2980, -1.0651],
# [-0.7754, 1.4391, 0.2942, 0.6324],
# [-0.8606, -0.3929, -1.0550, -0.0160]]]),
# tensor([[1, 2, 3],
# [4, 5, 6]]))
# print(x.shape, y.shape) # torch.Size([2, 3, 4]) torch.Size([2, 3])
# nn.Softmax(dim=0)
# 是每一列和为1.
# nn.Softmax(dim=1)
# 是每一行和为1.
softmax = nn.Softmax(dim=1)
x_softmax = softmax(x)
# (tensor([[[0.3779, 0.0750, 0.4070, 0.1654],
# [0.2321, 0.7175, 0.2905, 0.6442],
# [0.3901, 0.2075, 0.3025, 0.1904]],
# [[0.5727, 0.0714, 0.3052, 0.1073],
# [0.2228, 0.8005, 0.5517, 0.5862],
# [0.2046, 0.1282, 0.1431, 0.3065]]])
x_log = torch.log(x_softmax)
# tensor([[[-0.9732, -2.5903, -0.8990, -1.7991],
# [-1.4608, -0.3321, -1.2360, -0.4398],
# [-0.9414, -1.5724, -1.1958, -1.6588]],
# [[-0.5574, -2.6396, -1.1869, -2.2317],
# [-1.5016, -0.2226, -0.5948, -0.5342],
# [-1.5869, -2.0545, -1.9439, -1.1826]]]))
# 这两步可以替换为
ls = nn.LogSoftmax(dim=1)
x_log2 = ls(x)
# tensor([[[-0.9732, -2.5903, -0.8990, -1.7991],
# [-1.4608, -0.3321, -1.2360, -0.4398],
# [-0.9414, -1.5724, -1.1958, -1.6588]],
# [[-0.5574, -2.6396, -1.1869, -2.2317],
# [-1.5016, -0.2226, -0.5948, -0.5342],
# [-1.5869, -2.0545, -1.9439, -1.1826]]])
这里插播一个小操作:
y = torch.tensor([0, 2]) y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]]) y_hat[[0, 1], y]
这里最后一行是一个快捷操作,y_hat[[A],[B]],就是说,根据B来取数据,这里取出两个[0,0]和[1,2]。比较巧妙。
明白这个操作以后,就可以开始交叉熵函数的代码实现。
# 计算NLLLoss损失
# 其只要作用是求每个样本的标签处的预测值之和,然后取平均,变为正数
# 手动实现,适用预测值为二维,label为一维
loss = x_log[range(len(x_log)),y]
loss = abs(sum(loss)/len(x))
# 适用预测值为三维,label为二维
loss = []
for index, item in enumerate(x_log):
loss.append(sum(x_log[index,range(item.shape[0]),y[index]])/item.shape[0])
loss = abs(sum(loss)/len(x)) # 取绝对值以后(-∞,0)的值就会变成(0,+∞),也就是说如果预测的不对,即x_softmax小,即x_log小,即abs(x_log)大,是符合loss的定义的
# print(loss) # tensor(1.9770)
x_log2 = x_log2.permute(0,2,1) # shape:(2,4,3)
# 上面的写法等同于使用nn.NLLLoss()
# 我的问题就出在这里,如果不permute,就会无法匹配,选择出预测的值
# 报错:RuntimeError: Expected target size [2, 4], got [2, 3]
# print(x_log2.shape) # torch.Size([2, 4, 3])
loss_func = nn.NLLLoss()
loss_func(x_log2,y) # tensor(1.9770)
# 其他参考:https://zhuanlan.zhihu.com/p/383044774
网友评论