pytorch 损失函数

作者: 一杭oneline | 来源:发表于2020-02-13 17:08 被阅读0次

    pytorch 权值初始化与损失函数

    梯度爆炸和梯度消失

    为什么会产生以上问题

    E(X*Y) = E(X) * E(Y)

    D(X) = E(X^2)-[E(X)]^2

    D(X+Y) = D(X)+D(Y)

    D(X*Y) = D(X)*D(Y)+D(X)*[E(Y)]^2+D(Y)*[E(X)]^2

    E(X)=0,E(Y)=0

    D(X*Y) = D(X)*D(Y)

    H_{11} = \sum_{i=0}^{n}X_i*W_{1i}

    D(H_{11}) =\sum_{i=0}^{n}D(X_i)*D(W_{1i})= n*1*1

    每一次传递就会变为原来的N倍,batch_size的大小,梯度爆炸,要避免这个问题

    D(H_1) = n*D(X)*D(W)=1

    D(W)=1/N STD = \sqrt{1/N}

    增加激活函数后,权重越来越小,会出现梯度消失的问题

    方差一致性:保持数据尺度维持在恰当范围,通常方差为1

    饱和函数,如sigmod ,Tanh

    n_i*D(W)=1 n_{i+1}*D(W) = 1 N_i是输入层神经元个数,N_{i+1}是输出层神经元个数

    D(W)=2/(n_i+n_{i+1})

    通常采用均匀分布,W~U[-a,a]

    D(W) = \frac{(-a-a)^2}{12}=\frac{a^2}{3}

    得出a = -\frac{\sqrt{6}}{\sqrt{n_i+n_{i+1}}}

    对于非饱和函数 ,ReLU等变种

    D(W) = \frac{2}{n_i}

    ReLU 变种,在负半轴有斜率a

    D(W) = \frac{2}{(1+a^2)*n_i}

    tanh_gain = nn.init.calculate_gain('tanh')
    
    nn.init.xavier_uniform_(m.weight.data, gain=tanh_gain)
    # 以上方法只适应于饱和激活函数,并不适合reLU
    
    # relu 等变种激活函数
    nn.init.kaiming_normal_(m.weight.data)
    

    损失函数

    Loss = f(\hat{y},y) 单个样本叫损失,一般用这个

    Cost = \frac{1}{N}\sum_i^N{f(\hat{y_i},y_i)} 计算整个样本集损失平均值

    Obj = Cost + Regularization 目标函数,正则项

    pytorch中的损失继承nn.Module 主要是redaction none :每个神经元进行操作,sum average

    1.nn.CrossEntropyLoss()

    交叉熵损失函数

    这个是nn.LogSoftmax() nn.NLLLoss()两个函数计算的

    衡量两个概率分布的差异,信息熵,相对熵

    交叉熵 = 信息熵 + 相对熵 (熵表示一个信息的不确定性)

    I(x)=-log(P(X)) 自信息

    熵: H(P) = E_{x-p}[I(x)] = -\sum_{i}^N P(x_i)logP(x_i)

    相对熵: D_{KL}(P,Q) = E_{x-p}[log\frac{P(x)}{Q(x)}] = H(P,Q)-H(P) KL散度

    交叉熵:H(P,Q)=-\sum_{i=1}^{N}P(x_i)logQ(x_i) = D_{KL}(P,Q)+H(P)

    所以KL越小越好

    inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
    target = torch.tensor([0, 1, 1], dtype=torch.long)
    
    # 同target = torch.tensor([[1,0],[0,1],[0,1]], dtype=torch.float)
    # 样本3个 标签0 标签从0开始 其实每一个标签都是和样本维度一样输入[x1,x2,x3,x4] 标签[0,1,0,0] 这就是标签是1
    # loss函数作为神经网络的最后一层算是,举得例子就是类似最后一层的输出和真实标签的差距,所以target是long类型的,
    # 类似与索引
    
    
    weights = torch.tensor([1, 2], dtype=torch.float) # 负样本权重1 正样本权重2  label1 的样本权重1 label2 的样本权重2# weights = torch.tensor([0.7, 0.3], dtype=torch.float)
    loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean') 
    #forward
    loss_none_w = loss_f_none_w(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)
    #'none':就是每个神经元都进行一对一计算
    #'sum':计算出来每个神经元进行相加
    #'average':求平均,如果设置了weight,相当于把相应的样本进行拷贝,此例子中就是label1是1个,label2相应的样本增多两个,在计算weight时,把weight加起来
    
    # 输出
    tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)
    
    2.nn.NLLLoss()

    取反损失函数

    target位置取反

    inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
    target = torch.tensor([0, 1, 1], dtype=torch.long)
    
    weights = torch.tensor([1, 1], dtype=torch.float)
    
    loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none')
    loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')
        # forward
    loss_none_w = loss_f_none_w(inputs, target)
    #输出:
    tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)
    
    3.nn.BCELoss()

    二分类交叉熵函数
    l_n=-w_n[y_n·logx_n+(1-y_n)·log(1-x_n)]
    输入的target与为torch.float

    输入样本的各个属性值必须在[0,1],需要使用 sigmod()进行转换

    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    # 每个神经元一一对应计算loss
    target_bce = target
    # itarget
    inputs = torch.sigmoid(inputs)
    weights = torch.tensor([1, 1], dtype=torch.float)
    
    loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')    
    loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
    loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')
    #forward 同上
    # 输出:
    BCE Loss tensor([[0.3133, 2.1269],
            [0.1269, 2.1269],
            [3.0486, 0.0181],
            [4.0181, 0.0067]]) 
    tensor(11.7856) tensor(1.4732)
    
    4.nn.BCEWithLogitsLoss()

    结合Sigmod与二分类交叉熵

    网络最后不能加sigmod函数,自己带有sigmod的功能
    l_n=-w_n[y_n·log\delta(x_n)+(1-y_n)·log(1-\delta(x_n)]

    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    
    target_bce = target
    
        # itarget
        # inputs = torch.sigmoid(inputs)
        
    weights = torch.tensor([1], dtype=torch.float)
    pos_w = torch.tensor([3], dtype=torch.float)        # 3 正样本*3
    # 参数:pos_weight:正样本[0,1]类权值  weight 各类别loss设置权重
    loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w)
    loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w)
    loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)
    
        # forward
    loss_none_w = loss_f_none_w(inputs, target_bce)
    #输出
    weights:  tensor([1., 1.])
    BCE Loss tensor([[0.3133, 2.1269],
            [0.1269, 2.1269],
            [3.0486, 0.0181],
            [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)
    
    pos_weights:  tensor([3.])
    tensor([[0.9398, 2.1269],
            [0.3808, 2.1269],
            [3.0486, 0.0544],
            [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)
    
    5.nn.L1Loss ()

    计算inputs和target之差的绝对值
    l_n=|x_n-y_n|

    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    loss_f = nn.L1Loss(reduction='none')
    loss = loss_f(inputs, target)
    #输出:
    input:tensor([[1., 1.],
            [1., 1.]])
    target:tensor([[3., 3.],
            [3., 3.]])
    L1 loss:tensor([[2., 2.],
            [2., 2.]])
    
    6.nn.MSELoss()

    计算inputs与target之差的平方 l_n=(x_n-y_n)^2

    reduction:计算模式,可为none/sum/mean

    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    loss_f_mse = nn.MSELoss(reduction='none')
    loss_mse = loss_f_mse(inputs, target)
    MSE loss:tensor([[4., 4.],[4., 4.]])
    
    7.nn.SmoothL1Loss()

    平滑的L1Loss,
    loss(x,y)=\frac{1}{n}\sum_iz_i \\ z_i=[if |x_i-y_i|<1 :0.5(x_i-y_i)^2,else:|x_i-y_i|-0.5]

    在底端更加平滑

    <img src="D:\常用书籍\pytorch 跟学\pytorch学习\lesson\lesson-17\myplot.png" style="zoom:67%;" />

    inputs = torch.linspace(-3, 3, steps=500)
    target = torch.zeros_like(inputs)
    loss_f = nn.SmoothL1Loss(reduction='none')
    loss_smooth = loss_f(inputs, target)
    loss_l1 = np.abs(inputs.numpy())
    
    8.nn.PoissonNLLLoss()

    泊松分布(二项分布)的负数对数似然损失函数

    log_input:输入是否为对数形式,决定计算公式
    log\_input=true:loss(input,target) = exp(input)-target*input; \\ log\_input=false:input-target*log(input-eps)
    full:计算所有loss,默认false

    eps:修正项,避免NaN

    inputs = torch.randn((2, 2))
    target = torch.randn((2, 2))
    loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
    loss = loss_f(inputs, target)
    print("input:{}\ntarget:{}\nPoisson NLL loss:{}".format(inputs, target, loss))
    #输出:
    input:tensor([[0.6614, 0.2669],[0.0617, 0.6213]])
    target:tensor([[-0.4519, -0.1661],[-1.5228,  0.3817]])
    Poisson NLL loss:tensor([[2.2363, 1.3503],[1.1575, 1.6242]])
    
    idx = 0
    loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]*inputs[idx, idx]
    
    9.nn.KLDivLoss()

    KL散度,相对熵,计算两个分布的相似度
    D_{KL}(P||Q) = E_{x~p}[log{\frac{P(x)}{Q(x)}}]=E_{x~p}[logP(x)-logQ(x)]=\sum_{i=1}^nP(x_i)(logP(x_i)-logQ(x_i))

    此函数中计算
    l_n = y_n·(log{y_n}-x_n)
    这个时候的y_n已经是概率分布了(目标概率分布),就是 y 是第一类的概率为0.9 第二类0.0. 第三类0.05,[0.9,0.05,0.05],x_n是经过多层神经元输出的一个概率分布[0.8,0.1,0.1]

    上公式解释:对一个样本计算\sum去掉,x_n需要先计算一个log-probabilities,或使用nn.logsoftmax(),处理多分类时的概率分布

    batchmean:batchsize维度求平均值

    inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])   ##这是个分布
    inputs_log = torch.log(inputs)
    target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)
    
    loss_f_none = nn.KLDivLoss(reduction='none')
    loss_f_mean = nn.KLDivLoss(reduction='mean')
    loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean')
    
    loss_none = loss_f_none(inputs, target)
    loss_mean = loss_f_mean(inputs, target)
    loss_bs_mean = loss_f_bs_mean(inputs, target)
    
    print("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}".format(loss_none, loss_mean, loss_bs_mean))
    
    #输出:
    loss_none:tensor([[-0.5448, -0.1648, -0.1598],[-0.2503, -0.4597, -0.4219]])
      warnings.warn("reduction: 'mean' divides the total loss by both the batch size and the support size."
    loss_mean:-0.3335360586643219
    loss_bs_mean:-1.000608205795288
    
    idx = 0
    loss_1 = target[idx, idx] * (torch.log(target[idx, idx]) - inputs[idx, idx])
    
    10.nn.MarginRankingLoss()

    两个N维向量之间的相似度,用于排序任务,该方法计算两组数据之间的差异,返回一个N*N的Loss矩阵
    loss(x,y)=max(0 ,-y*(x1-x2)+margin)
    y=1,希望x1比x2大,当x1>x2时,不产生loss

    y=-1,希望x1比x2小,当x2>x1时,不产生loss

    margin :边界值

    reduction:计算模式

    x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
    x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)
    
    target = torch.tensor([1, 1, -1], dtype=torch.float)  #这是y
    loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')
    loss = loss_f_none(x1, x2, target)
    # y=-1 x1[2]=3  3-2 3-2 3-2 1 1 -1 --->0 0 1
    #输出:
    loss:tensor([[1., 1., 0.],
            [0., 0., 0.],
            [0., 0., 1.]])
    
    11.nn.MultiLabelMarginLoss()

    多标签边界损失函数,多标签:一个样本有多个标签,比如一张图片对应多个类别

    举例:四分类任务,样本x属于0类和3类,标签[0,3,-1,-1],不是[1,0,0,1]
    loss(x,y)=\sum_{ij}{\frac{max(0,1-(x[y[j]]-x[i]))}{x.size(0)}}
    其中y:label,i=0到x.size(0);j=0到y.size(0),y[j]≥0,and i≠y[j] \quad for \quad all \quad i\quad and \quad j

    公式含义是标签所在神经元减去不是标签所在的神经元,

    只有标签所在神经元大于不是标签所在的神经元的值小于1时,两类差越大越好,才有意义

    x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
    y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long) #样本属于第0类和第3类,数据为long类型
    loss_f = nn.MultiLabelMarginLoss(reduction='none')
    loss = loss_f(x, y)
    #输出:
    tensor([0.8500])
    
    # 计算步骤
    x = x[0]
    item_1 = (1-(x[0] - x[1])) + (1 - (x[0] - x[2]))    # [0]
    item_2 = (1-(x[3] - x[1])) + (1 - (x[3] - x[2]))    # [3]
    loss_h = (item_1 + item_2) / x.shape[0]
    
    12.nn.SoftMarginLoss()

    二分类logistic损失
    loss(x,y)=\sum_i\frac{log(1+exp(-y[i]*x[i]))}{x.nelement()}
    x.nelement()是平均值, y=1或-1

    inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
    target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)
    
    loss_f = nn.SoftMarginLoss(reduction='none')
    loss = loss_f(inputs, target)
    #输出:
    SoftMargin:  tensor([[0.8544, 0.4032],[0.4741, 0.9741]])
    
    idx = 0
    inputs_i = inputs[idx, idx]
    target_i = target[idx, idx]
    
    loss_h = np.log(1 + np.exp(-target_i * inputs_i))
    #输出:tensor(0.8544)
    
    13.nn.MultiLabelSoftMarginLoss()

    softmarginloss 的多标签版本
    loss(x,y)=-\frac{1}{C}*\sum_iy[i]*log(\frac{1}{(1+exp(-x[i]))})+(1-y[i])*log(\frac{exp(-x[i])}{1+exp(-x[i])})
    C是类别数量​

    假设4分类,y取值[1,0,0,1],属于第1类和第4类,这个和多标签边界损失函数标签不一样

    inputs = torch.tensor([[0.3, 0.7, 0.8]])
    target = torch.tensor([[0, 1, 1]], dtype=torch.float) #标签是float型
    loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')
    loss = loss_f(inputs, target)
    #输出:MultiLabel SoftMargin:  tensor([0.5429])
    
    #手算
    i_0 = torch.log(torch.exp(-inputs[0, 0]) / (1 + torch.exp(-inputs[0, 0])))
    i_1 = torch.log(1 / (1 + torch.exp(-inputs[0, 1])))
    i_2 = torch.log(1 / (1 + torch.exp(-inputs[0, 2])))
    loss_h = (i_0 + i_1 + i_2) / -3
    
    14.nn.MultiMarginLoss()

    计算多分类的折页损失
    loss(x,y)=\frac{\sum_imax(0,margin-(x[y]-x[i]))^p}{x.size(0)}
    where\quad x∈\{0,…,x.size(0)-1\},y∈\{0,...,y.size(0)-1\},0≤y[i]≤x.size(0)-1,and\quad i≠y[j]forall i and j

    y取值为torch.long类型,[1,2,1]表示第1个样本为第1类,第2个样本为第2类,第3个样本为第1类

    标签值减去非标签的值,i不能等于标签所在项

    主要参数:weight 各类别的loss设置权重,margin边界值,默认1,p:可选1或者2,默认1

    x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
    y = torch.tensor([1, 2], dtype=torch.long) #label类型为long
    
    loss_f = nn.MultiMarginLoss(reduction='none')
    loss = loss_f(x, y)
    #输出:Multi Margin Loss:  tensor([0.8000, 0.7000])
    
     x = x[0]
     margin = 1
    
     i_0 = margin - (x[1] - x[0])
     #i_1 = margin - (x[1] - x[1]) #0
     i_2 = margin - (x[1] - x[2])
     loss_h = (i_0 + i_2) / x.shape[0]
    
     print(loss_h) #tensor(0.8000)
    
    15.nn.TripletMarginLoss()

    三元组损失,人脸识别中常用
    L(a,p,n)=max\{d(a_i,p_i)-d(a_i,n_i)+margin,0\} \quad d(x_i,y_i)=||x_i-y_i||_p\quad p范数
    计算点与点之间的距离,anchor-pos之间的距离要比anchor-neg之间的距离小,anchor是自己的头像,pos是自己的头像,neg是别人的头像

    anchor = torch.tensor([[1.]])
    pos = torch.tensor([[2.]])
    neg = torch.tensor([[0.5]])
    
    loss_f = nn.TripletMarginLoss(margin=1.0, p=1)
    loss = loss_f(anchor, pos, neg) 
    #输出:Triplet Margin Loss tensor(1.5000)
    
    margin = 1
    a, p, n = anchor[0], pos[0], neg[0]
    
    d_ap = torch.abs(a-p)
    d_an = torch.abs(a-n)
    
    loss = d_ap - d_an + margin
    
    16.nn.HingeEmbeddingLoss()

    计算两个输入的相似性,常用于非线性嵌入和半监督学习,输入x应该是两个输入之差的绝对值
    l_n:if\quad y_n=1\quad x_n;if\quad y_n=-1\quad max\{0,\Delta-x_n\} \quad \Delta=margin

    inputs = torch.tensor([[1., 0.8, 0.5]])
    target = torch.tensor([[1, 1, -1]])  #int型
    
    loss_f = nn.HingeEmbeddingLoss(margin=1, reduction='none')
    loss = loss_f(inputs, target)
    # Hinge Embedding Loss tensor([[1.0000, 0.8000, 0.5000]])
    
    margin = 1.
    loss = max(0, margin - inputs.numpy()[0, 2])
    print(loss)  #0.5
    
    17.nn.CosineEmbeddingLoss()

    采用余弦相似度计算两个输入的相似性,embading中使用,计算方向上的差异

    cos(\theta)=\frac{A·B}{||A||||B||}

    \begin{equation} loss(x,y)=\left\{ \begin{aligned} 1-cos(x_1,x_2) & , & if \quad y=1\\ max(0,cos(x_1,x_2)-margin) & , & if \quad y=-1. \end{aligned} \right. \end{equation}

    margin取值[-1,1],推荐取值[0,0.5]

    x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
    x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])
    target = torch.tensor([[1, -1]], dtype=torch.float)
    
    loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none')
    loss = loss_f(x1, x2, target)
    print("Cosine Embedding Loss", loss)
    
    #Cosine Embedding Loss tensor([[0.0167, 0.9833]])
    
    margin = 0.
    def cosine(a, b):
        numerator = torch.dot(a, b)
        denominator = torch.norm(a, 2) * torch.norm(b, 2)
        return float(numerator/denominator)
    #norm 函数就是求范数,默认是2,就是求向量的模
    l_1 = 1 - (cosine(x1[0], x2[0]))
    
    l_2 = max(0, cosine(x1[0], x2[0]))
    
    print(l_1, l_2)
    
    18.nn.CTCLoss()

    时序类数据分类

    相关文章

      网友评论

        本文标题:pytorch 损失函数

        本文链接:https://www.haomeiwen.com/subject/oqdlfhtx.html