Pytorch袖珍手册之十三

作者: 深思海数_willschang | 来源:发表于2021-08-29 14:50 被阅读0次

Pytorch袖珍手册之十三
Pytorch袖珍手册之十
Pytorch袖珍手册之十一
Pytorch袖珍手册之五
Pytorch袖珍手册之四
Pytorch袖珍手册之八
Pytorch袖珍手册之九
Pytorch袖珍手册之六
Pytorch袖珍手册之七
Pytorch袖珍手册之十四

pytorch pocket reference

第六章 Pytorch加速及优化（性能提升）之四

模型优化--量化 Quantization

模型量化属于模型压缩的范畴，模型压缩的目的旨在降低模型的内存大小，加速模型的推断速度（除了压缩之外，一些模型推断框架也可以通过内存，io，计算等优化来加速推断）。

神经网络在实际计算中通常会以32位或64位的浮点数进行计算，因此可以通过降低数值精度来减小模型大小，同时保证一定的模型应用精度。

量化不仅在计算时，还有就是在内存中数据获取过程中都是采用低精度数据（lower-precision data）。量化可以减少模型大小，降低内存带宽，有更快的推理速度等等。
如一个量化方法就是对所有的计算精度进行减半操作。
示例：基于LeNet5量化操作

LeNet5模型，通常是float32精度数据

import torch
from torch import nn
import torch.nn.functional as F

class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, int(x.nelement() / x.shape[0]))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = LeNet5()

for n, p in model.named_parameters():
    print(n, ':', p.dtype)

"""
# conv1.weight : torch.float32
# conv1.bias : torch.float32
# conv2.weight : torch.float32
# conv2.bias : torch.float32
# fc1.weight : torch.float32
# fc1.bias : torch.float32
# fc2.weight : torch.float32
# fc2.bias : torch.float32
# fc3.weight : torch.float32
# fc3.bias : torch.float32
"""

只需要一行代码，即可实现减少一半的模型大小

model = model.half()
for n, p in model.named_parameters():
    print(n, ':', p.dtype)


"""
# conv1.weight : torch.float16
# conv1.bias : torch.float16
# conv2.weight : torch.float16
# conv2.bias : torch.float16
# fc1.weight : torch.float16
# fc1.bias : torch.float16
# fc2.weight : torch.float16
# fc2.bias : torch.float16
# fc3.weight : torch.float16
# fc3.bias : torch.float16
"""

Pytorch提供了3种量化方法：

动态量化 Dynamic Quantization

动态量化支持将浮点模型转换为具有静态int8或float16数据类型的权重和动态量化的量化模型。当权重量化为int8时，激活（每批）动态量化为int8。在PyTorch中，我们有torch.quantization.quantize_dynamic API，该API用仅动态权重的量化版本替换了指定的模块，并输出了量化的模型。

torch.quantize_per_tensor()函数的scale和zero_point需要自己设定。
所谓动态是指这个函数torch.quantization.quantize_dynamic能自动选择最合适的scale和zero_point。

训练后静态量化 Post Training static Quantization

训练后的量化理解起来比较简单，将训练后的模型中的权重由float32量化到int8，并以int8的形式保存，但是在实际推断时，还需要反量化为float类型进行计算。这种量化的方法在大模型上表现比较好，因为大模型的抗噪能力很强，但是在小模型上效果就很差。

训练中量化 Quantization-Aware Training

训练中引入量化是指在训练的过程中引入伪量化操作，即在前向传播时，采用量化后的权重和激活，但是在反向传播时仍是对float类型的权重做梯度更新；在预测时将全部采用int8的方式进行计算。

网友评论

本文标题：Pytorch袖珍手册之十三

本文链接：https://www.haomeiwen.com/subject/ejgwiltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Pytorch袖珍手册之十三

第六章 Pytorch加速及优化（性能提升）之四

模型优化--量化 Quantization

相关文章