【深度学习实践】补充02. Label Smoothing

作者: 砥砺前行的人 | 来源:发表于2022-01-20 20:18 被阅读0次

【深度学习实践】补充02. Label Smoothing
Label smoothing 标签平滑
Pytorch: label smothing
一个基础的网络超参数
基于tensorflow的label smoothing实现
标签平滑Label Smoothing
【深度学习实践】02. Softmax 回归
标签平滑 Label Smoothing 详解及 pytorch
label smooth
Focal loss和Label smoothing

出自论文：Rethinking the Inception Architecture for Computer Vision

深度学习中，分类问题在输出层通常会使用 Softmax 函数进行输出值的规范化，将输出值缩放到概率分布区间（输出和为1且大小关系不变）。Softmax 函数如下：
$\hat{y}_j = \frac{\exp(o_j)}{\sum_k \exp(o_k)}$

以 e 为底的指数函数图像如下：

通过分析函数图像可知，函数恒大于0 ，随着 x 的增大，函数趋近于正无穷。

分类的损失函数则使用交叉熵损失函数，如下所示:
$l (\mathbf{y}, \hat{\mathbf{y}}) = - \sum_{j=1}^q y_j \log \hat{y}_j$

标签使用独热编码表示： $y \in \{(1, 0, 0), (0, 1, 0), (0, 0, 1) \}$ 。所以每次只有一个标签分量为1，其他都为0，其他分量乘上0都会被低效。所以，对于某一个样本，损失值为：
$l (\mathbf{y}, \hat{\mathbf{y}}) = - \log \hat{y}$

为了使的损失值尽可能的小，则 $\hat{y}$ 的值需要趋近于1，也就是说， $\hat{y}_j = \frac{\exp(o_j)}{\sum_k \exp(o_k)}$ 趋近于 1，通过 o 的关系以及指数函数不难得知，当 $o_j$ 趋近于正无穷时，才能使得 softmax 值趋近于1，而模型就是向着这个目标不停得反向传播降低损失，更新参数。这种极端的方法极有可能造成模型过拟合。

Label Smoothing

Label Smoothing 可以解决上述问题，这是一种正则化策略，主要是通过soft one-hot（软独热编码）来加入噪声，减少了真实样本标签的类别在计算损失函数时的权重，最终起到抑制过拟合的效果。

Label Smoothing

K 是待分类别的数目， $\varepsilon$ 是一个超参数，通常为一个小值。模型通过抑制正负样本输出差值，使得网络有更强的泛化能力。

实际场景中，不同类别之间并不是完全对立，也许A类别是狸猫，B类别是东北虎，并不是如独热编码一样一个1其他零，适当的 Label Smoothing 也许更贴合场景。

我们通过一段代码来展示 Label Smoothing
的抗过拟合能力：

import numpy as np
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Subset
import torch.nn.functional as F
from torch import nn
import time

device = torch.device('cuda:0')

trans = transforms.ToTensor()

train_set = datasets.FashionMNIST(
    root="./data/", train=True, transform=trans, download=True)
test_set = datasets.FashionMNIST(
    root="./data/", train=False, transform=trans, download=True)

class Mnist_CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=1)

    def forward(self, xb):
        xb = xb.view(-1, 1, 28, 28)
        xb = F.relu(self.conv1(xb))
        xb = F.relu(self.conv2(xb))
        xb = F.relu(self.conv3(xb))
        xb = F.avg_pool2d(xb, 4)
        return xb.view(-1, xb.size(1))

def loss_batch(model, loss_func, xb, yb, opt=None):
    loss = loss_func(model(xb), yb)

    if opt is not None:
        loss.backward()
        opt.step()
        opt.zero_grad()

    return loss.item(), len(xb)

def fit(epochs, model, loss_func, opt, train_dl, valid_dl, results, label_smooth):
    for epoch in range(epochs):
        model.train()
        for xb, yb in train_dl:
            xb = xb.to(device)
            yb = yb.to(device)
            loss_batch(model, loss_func, xb, yb, opt)

        model.eval()
        with torch.no_grad():
            losses, nums = zip(
                *[loss_batch(model, loss_func, xb.to(device), yb.to(device)) for xb, yb in valid_dl]
            )
        val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)

        print(epoch, val_loss)
        results[label_smooth].append(val_loss)

batch_size = 32 # 
epochs = 200
results = {}
label_smooths = [0.0, 0.05, 0.1]

train_set = Subset(train_set, np.arange(0, len(train_set), 10))
test_set = Subset(test_set, np.arange(0, len(train_set), 10))

train_dl = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=8, prefetch_factor=32)
valid_dl = DataLoader(test_set, batch_size=batch_size, shuffle=False, num_workers=8, prefetch_factor=32)

for label_smooth in label_smooths:
    start = time.perf_counter()
    model = Mnist_CNN().to(device)
    loss = nn.CrossEntropyLoss(label_smoothing=label_smooth)
    optim = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)

    results[label_smooth] = []
    fit(epochs, model, loss, optim, train_dl, valid_dl, results, label_smooth)
    results[label_smooth].append(time.perf_counter()-start)

import matplotlib.pyplot as plt

x = range(1, epochs + 1)

fig, ax = plt.subplots()
for item in results:
    ax.plot(list(x), results[item][:-1], label=str(item))

ax.set_xlabel('epochs') #设置x轴名称 x label
ax.set_ylabel('val loss') #设置y轴名称 y label
ax.set_title('different Batch Size') #设置图名为Simple Plot
ax.legend() #自动检测要在图例中显示的元素，并且显示

plt.show()

【深度学习实践】补充02. Label Smoothing
出自论文：Rethinking the Inception Architecture for Computer V...
Label smoothing 标签平滑
Label smoothing是机器学习中的一种正则化方法，其全称是 Label Smoothing Regula...
Pytorch: label smothing
label smoothing pytorch版本
一个基础的网络超参数
判别器。 encoder和decoder。判别器用leaky-Relu，以及label smoothing，优化...
基于tensorflow的label smoothing实现
tensorflow实现方法1：方法2： label smoothing原理（标签平滑）对于分类问题，常规...
标签平滑Label Smoothing
Lable Smoothing是分类问题中错误标注的一种解决方法。对于分类问题，特别是多分类问题，常常把向量转换成...
【深度学习实践】02. Softmax 回归
虽然称作 Softmax 回归，但本质上 Softmax 处理的是分类问题。分类，顾名思义，即对样本数据集进行“哪...
标签平滑 Label Smoothing 详解及 pytorch
定义标签平滑（Label smoothing），像L1、L2和dropout一样，是机器学习领域的一种正则化方法...
label smooth
标签平滑：Label Smoothing（标签平滑）是一个经典的正则化方法，机器学习的样本中通常会存在少量错误标签...
Focal loss和Label smoothing
Focal loss 目的是对于well trained examples减少他的loss，在一个负样本比正样本多...