pytorch的layernorm

作者: Koap | 来源:发表于2024-02-18 16:10 被阅读0次

Pytorch Norm 层
BatchNormalization 和 LayerNorm
LayerNorm核心技术
【干货】史上最全的PyTorch学习资源汇总
25、pytorch的学习--基础
多分类下的ROC曲线及2019/12/19备忘录
pytorch-lightning baseline
给大家推荐：五个Python小项目，Github上的人气很高的
给大家推荐：五个Python小项目，Github上的人气很高的
给大家推荐：五个Python小项目，Github上的人气很高的

建议使用torch.nn.LayerNorm实现，比torch.layer_norm灵活度更高。
可对tensor实现任意维度的归一化操作。
官方示例：

        >>> # NLP Example
        >>> batch, sentence_length, embedding_dim = 20, 5, 10
        >>> embedding = torch.randn(batch, sentence_length, embedding_dim)
        >>> layer_norm = nn.LayerNorm(embedding_dim)
        >>> # Activate module
        >>> layer_norm(embedding)
        >>>
        >>> # Image Example
        >>> N, C, H, W = 20, 5, 10, 10
        >>> input = torch.randn(N, C, H, W)
        >>> # Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
        >>> # as shown in the image below
        >>> layer_norm = nn.LayerNorm([C, H, W])
        >>> output = layer_norm(input)

通过官方示例可知LN在NLP和CV领域用法的不同。
在NLP中，LN相当于IN(instance normalization)，只对最后一维上的元素做归一化（图1最右）；
在CV中，LN会对C,H,W三个维度上的所有元素做归一化（图1中间）。

图1
而且，torch.nn.LayerNorm默认的elementwise_affine=True，即有可学习的scale和bias参数，like BN。

假设和实验

根据以上示例很自然想到，如果 layer_norm = nn.LayerNorm([N, H, W])，layer_norm就变成了BN。

N, C, H, W = 20, 5, 10, 10
input = torch.randn(N, C, H, W)
layer_norm = torch.nn.LayerNorm([N, H, W])
output = layer_norm(input)

RuntimeError: Given normalized_shape=[20, 10, 10], expected input with shape [*, 20, 10, 10], but got input of size[20, 5, 10, 10]

报错了，不允许这样的设定。
但我们可以通过将tensor的N/C两个维度交换一下来实现同样的效果：

N, C, H, W = 20, 5, 10, 10
input = torch.randn(N, C, H, W)
# LN
layer_norm = torch.nn.LayerNorm([N, H, W],elementwise_affine=False)
output = layer_norm(input.transpose(0,1)).transpose(0,1)
# BN
bn = torch.nn.BatchNorm2d(C,affine=False)
ouput_bn = bn(input)
# 结果相减
torch.sum(output-ouput_bn)

result:
tensor(8.2701e-07)

可见假设成立。