美文网首页
TORCH03-09GoogLeNet网络

TORCH03-09GoogLeNet网络

作者: 杨强AT南京 | 来源:发表于2020-05-09 08:23 被阅读0次

      GoogLeNet有意思的是设计了辅助分类器,还有一个就是卷积拆分,这个在VGG中实际已经提出,这个主题梳理了GoogLeNet的分类技术,顺便实现了多辅助分类器的训练,效果在ImageNet2012数据集上还不错(我只是用4类没有打乱的图像训练)。
      要想跟上最新的技术,现在看书不顶用了,得看最新学术论文。


    关于GoogLeNet网络

    • 2014年,GoogLeNet和VGG是当年ImageNet挑战赛(ILSVRC14)的双雄,GoogLeNet获得了第一名、VGG获得了第二名,这两类模型结构的共同特点是层次更深了。

      • VGG继承了LeNet以及AlexNet的一些框架结构,而GoogLeNet则做了更加大胆的网络结构尝试
      • 内存或计算资源有限时,GoogleNet是比较好的选择;从模型结果来看,GoogLeNet的性能却更加优越
        • GoogLeNet虽然深度只有22层,但大小却比AlexNet和VGG小很多;
        • GoogleNet参数为500万个,AlexNet参数个数是GoogleNet的12倍,VGGNet参数又是AlexNet的3倍。
    • GoogLeNet是谷歌(Google)研究出来的深度网络结构,为什么不叫"GoogleNet",而叫"GoogLeNet",据说是为了向"LeNet"致敬,因此取名为"GoogLeNet"

    • 参考资料:

      • https://my.oschina.net/u/876354/blog/1637819

    GoogLeNet的设计思想

    • GoogLeNet的核心是提出了Inception(开端)的概念与设计,并且逐步优化设计了4个版本。
      1. Inception V1
      2. Inception V2
      3. Inception V3
      4. Inception V4

    Inception的核心思想

    • Inception的核心思想是堆叠
      • 通过设计一个稀疏网络结构,但是能够产生稠密的数据;
      • 既能增加神经网络表现,又能保证计算资源的使用效率。
    Inception的设计思想示意图
    • Inception使用3个卷积 + 1个池化运算堆叠而成

      1. 卷积:(1 \times 1, 3 \times 3, 5 \times 5)
      2. 池化:(3 \times 3)最大池化;
      3. 每个卷积都使用ReLU函数激活;
      4. 输出后叠加,会增加你输出的通道(深度)
    • Inception设计的分析

      • 增加了网络的宽度:
        • 3个卷积运算构成的稀疏网络能够提取输入的每一个细节信息,增加提供网络;
      • 降低过度拟合:
        • 池化运算减少空间大小,降低过度拟合(有点类似Dropout,只是不是随机的)
      • 增加了网络对尺度的适应性:
        • 三个不同大小的卷积可以增加网络对尺度的适应性;

    Inception V1

    Inception V1设计

    • V1在上面Inception基础上,做了运算优化, 增加了一个1 \times 1的卷积运算。
    Inception V1
    • 理解堆叠的概念
    GoogLeNet堆叠的示意图
    • 1 \times 1 卷积运算的分析

      1. 可以提供一个ReLU运算,增加非线性性。
      2. 因为上层叠加,所以每层的深度或者通道数会越来越大,可以增加1 \times 1用来减少深度或者通道,这样在做3 \times 35 \times 5的卷积运算时,可以降低运算量。
    • 注意:

      • 注意理解增加一个1 \times 1卷积为什么会减少运算量?
        1. 假设输入:100 \times 100 \times 128,假设卷积核为: 5 \time 5 \times 256,步长为1。
        2. 不增加1 \times 1卷积:
          • 权重参数:128 \times 5 \times 5 \times 256 = 819200
        3. 增加1 \times 1卷积(层数假设为32):
          • 权重参数:128 \times 1 \times 1 \times 32 + 32 \times 5 \times 5 \times 256 = 204800
        4. 说明:
          • 增加1 \times 1卷积后计算量是原来的\dfrac{1}{4}

    GoogLeNet 22层网络结构

    GoogLet 22层网络结构

    Torch的GoogLeNet网络实现

    • Torch实现了GoogLeNet网络
      • 从源代码与输出的结构看,官方的实现应该是Inception V1, 增加了BatchNorm2d,算是增强V1
    from torchvision.models import GoogLeNet
    GoogLeNet?
    
    �[1;31mInit signature:�[0m
    �[0mGoogLeNet�[0m�[1;33m(�[0m�[1;33m
    �[0m    �[0mnum_classes�[0m�[1;33m=�[0m�[1;36m1000�[0m�[1;33m,�[0m�[1;33m
    �[0m    �[0maux_logits�[0m�[1;33m=�[0m�[1;32mTrue�[0m�[1;33m,�[0m�[1;33m
    �[0m    �[0mtransform_input�[0m�[1;33m=�[0m�[1;32mFalse�[0m�[1;33m,�[0m�[1;33m
    �[0m    �[0minit_weights�[0m�[1;33m=�[0m�[1;32mTrue�[0m�[1;33m,�[0m�[1;33m
    �[0m    �[0mblocks�[0m�[1;33m=�[0m�[1;32mNone�[0m�[1;33m,�[0m�[1;33m
    �[0m�[1;33m)�[0m�[1;33m�[0m�[0m
    �[1;31mDocstring:�[0m     
    Base class for all neural network modules.
    
    Your models should also subclass this class.
    
    Modules can also contain other Modules, allowing to nest them in
    a tree structure. You can assign the submodules as regular attributes::
    
        import torch.nn as nn
        import torch.nn.functional as F
    
        class Model(nn.Module):
            def __init__(self):
                super(Model, self).__init__()
                self.conv1 = nn.Conv2d(1, 20, 5)
                self.conv2 = nn.Conv2d(20, 20, 5)
    
            def forward(self, x):
                x = F.relu(self.conv1(x))
                return F.relu(self.conv2(x))
    
    Submodules assigned in this way will be registered, and will have their
    parameters converted too when you call :meth:`to`, etc.
    �[1;31mInit docstring:�[0m Initializes internal Module state, shared by both nn.Module and ScriptModule.
    �[1;31mFile:�[0m           c:\program files\python36\lib\site-packages\torchvision\models\googlenet.py
    �[1;31mType:�[0m           type
    �[1;31mSubclasses:�[0m     QuantizableGoogLeNet
    
    • aux_logits用来控制中间的两个分类输出

    • GoogLeNet网络的输入仍然是

      • 224 \times 224 \times 3
    from torchvision.models import GoogLeNet
    from torchsummary import summary
    net = GoogLeNet()
    # print(net)
    # print("=========================================================")
    # 输出网络结构
    print(summary(net,input_size=(3, 224, 224), device='cpu'))
    # print("=========================================================")
    # # 输出网络结构
    # print(summary(net.cuda(),input_size=(3, 224, 224)))
    
    ----------------------------------------------------------------
            Layer (type)               Output Shape         Param #
    ================================================================
                Conv2d-1         [-1, 64, 112, 112]           9,408
           BatchNorm2d-2         [-1, 64, 112, 112]             128
           BasicConv2d-3         [-1, 64, 112, 112]               0
             MaxPool2d-4           [-1, 64, 56, 56]               0
                Conv2d-5           [-1, 64, 56, 56]           4,096
           BatchNorm2d-6           [-1, 64, 56, 56]             128
           BasicConv2d-7           [-1, 64, 56, 56]               0
                Conv2d-8          [-1, 192, 56, 56]         110,592
           BatchNorm2d-9          [-1, 192, 56, 56]             384
          BasicConv2d-10          [-1, 192, 56, 56]               0
            MaxPool2d-11          [-1, 192, 28, 28]               0
               Conv2d-12           [-1, 64, 28, 28]          12,288
          BatchNorm2d-13           [-1, 64, 28, 28]             128
          BasicConv2d-14           [-1, 64, 28, 28]               0
               Conv2d-15           [-1, 96, 28, 28]          18,432
          BatchNorm2d-16           [-1, 96, 28, 28]             192
          BasicConv2d-17           [-1, 96, 28, 28]               0
               Conv2d-18          [-1, 128, 28, 28]         110,592
          BatchNorm2d-19          [-1, 128, 28, 28]             256
          BasicConv2d-20          [-1, 128, 28, 28]               0
               Conv2d-21           [-1, 16, 28, 28]           3,072
          BatchNorm2d-22           [-1, 16, 28, 28]              32
          BasicConv2d-23           [-1, 16, 28, 28]               0
               Conv2d-24           [-1, 32, 28, 28]           4,608
          BatchNorm2d-25           [-1, 32, 28, 28]              64
          BasicConv2d-26           [-1, 32, 28, 28]               0
            MaxPool2d-27          [-1, 192, 28, 28]               0
               Conv2d-28           [-1, 32, 28, 28]           6,144
          BatchNorm2d-29           [-1, 32, 28, 28]              64
          BasicConv2d-30           [-1, 32, 28, 28]               0
            Inception-31          [-1, 256, 28, 28]               0
               Conv2d-32          [-1, 128, 28, 28]          32,768
          BatchNorm2d-33          [-1, 128, 28, 28]             256
          BasicConv2d-34          [-1, 128, 28, 28]               0
               Conv2d-35          [-1, 128, 28, 28]          32,768
          BatchNorm2d-36          [-1, 128, 28, 28]             256
          BasicConv2d-37          [-1, 128, 28, 28]               0
               Conv2d-38          [-1, 192, 28, 28]         221,184
          BatchNorm2d-39          [-1, 192, 28, 28]             384
          BasicConv2d-40          [-1, 192, 28, 28]               0
               Conv2d-41           [-1, 32, 28, 28]           8,192
          BatchNorm2d-42           [-1, 32, 28, 28]              64
          BasicConv2d-43           [-1, 32, 28, 28]               0
               Conv2d-44           [-1, 96, 28, 28]          27,648
          BatchNorm2d-45           [-1, 96, 28, 28]             192
          BasicConv2d-46           [-1, 96, 28, 28]               0
            MaxPool2d-47          [-1, 256, 28, 28]               0
               Conv2d-48           [-1, 64, 28, 28]          16,384
          BatchNorm2d-49           [-1, 64, 28, 28]             128
          BasicConv2d-50           [-1, 64, 28, 28]               0
            Inception-51          [-1, 480, 28, 28]               0
            MaxPool2d-52          [-1, 480, 14, 14]               0
               Conv2d-53          [-1, 192, 14, 14]          92,160
          BatchNorm2d-54          [-1, 192, 14, 14]             384
          BasicConv2d-55          [-1, 192, 14, 14]               0
               Conv2d-56           [-1, 96, 14, 14]          46,080
          BatchNorm2d-57           [-1, 96, 14, 14]             192
          BasicConv2d-58           [-1, 96, 14, 14]               0
               Conv2d-59          [-1, 208, 14, 14]         179,712
          BatchNorm2d-60          [-1, 208, 14, 14]             416
          BasicConv2d-61          [-1, 208, 14, 14]               0
               Conv2d-62           [-1, 16, 14, 14]           7,680
          BatchNorm2d-63           [-1, 16, 14, 14]              32
          BasicConv2d-64           [-1, 16, 14, 14]               0
               Conv2d-65           [-1, 48, 14, 14]           6,912
          BatchNorm2d-66           [-1, 48, 14, 14]              96
          BasicConv2d-67           [-1, 48, 14, 14]               0
            MaxPool2d-68          [-1, 480, 14, 14]               0
               Conv2d-69           [-1, 64, 14, 14]          30,720
          BatchNorm2d-70           [-1, 64, 14, 14]             128
          BasicConv2d-71           [-1, 64, 14, 14]               0
            Inception-72          [-1, 512, 14, 14]               0
               Conv2d-73            [-1, 128, 4, 4]          65,536
          BatchNorm2d-74            [-1, 128, 4, 4]             256
          BasicConv2d-75            [-1, 128, 4, 4]               0
               Linear-76                 [-1, 1024]       2,098,176
               Linear-77                 [-1, 1000]       1,025,000
         InceptionAux-78                 [-1, 1000]               0
               Conv2d-79          [-1, 160, 14, 14]          81,920
          BatchNorm2d-80          [-1, 160, 14, 14]             320
          BasicConv2d-81          [-1, 160, 14, 14]               0
               Conv2d-82          [-1, 112, 14, 14]          57,344
          BatchNorm2d-83          [-1, 112, 14, 14]             224
          BasicConv2d-84          [-1, 112, 14, 14]               0
               Conv2d-85          [-1, 224, 14, 14]         225,792
          BatchNorm2d-86          [-1, 224, 14, 14]             448
          BasicConv2d-87          [-1, 224, 14, 14]               0
               Conv2d-88           [-1, 24, 14, 14]          12,288
          BatchNorm2d-89           [-1, 24, 14, 14]              48
          BasicConv2d-90           [-1, 24, 14, 14]               0
               Conv2d-91           [-1, 64, 14, 14]          13,824
          BatchNorm2d-92           [-1, 64, 14, 14]             128
          BasicConv2d-93           [-1, 64, 14, 14]               0
            MaxPool2d-94          [-1, 512, 14, 14]               0
               Conv2d-95           [-1, 64, 14, 14]          32,768
          BatchNorm2d-96           [-1, 64, 14, 14]             128
          BasicConv2d-97           [-1, 64, 14, 14]               0
            Inception-98          [-1, 512, 14, 14]               0
               Conv2d-99          [-1, 128, 14, 14]          65,536
         BatchNorm2d-100          [-1, 128, 14, 14]             256
         BasicConv2d-101          [-1, 128, 14, 14]               0
              Conv2d-102          [-1, 128, 14, 14]          65,536
         BatchNorm2d-103          [-1, 128, 14, 14]             256
         BasicConv2d-104          [-1, 128, 14, 14]               0
              Conv2d-105          [-1, 256, 14, 14]         294,912
         BatchNorm2d-106          [-1, 256, 14, 14]             512
         BasicConv2d-107          [-1, 256, 14, 14]               0
              Conv2d-108           [-1, 24, 14, 14]          12,288
         BatchNorm2d-109           [-1, 24, 14, 14]              48
         BasicConv2d-110           [-1, 24, 14, 14]               0
              Conv2d-111           [-1, 64, 14, 14]          13,824
         BatchNorm2d-112           [-1, 64, 14, 14]             128
         BasicConv2d-113           [-1, 64, 14, 14]               0
           MaxPool2d-114          [-1, 512, 14, 14]               0
              Conv2d-115           [-1, 64, 14, 14]          32,768
         BatchNorm2d-116           [-1, 64, 14, 14]             128
         BasicConv2d-117           [-1, 64, 14, 14]               0
           Inception-118          [-1, 512, 14, 14]               0
              Conv2d-119          [-1, 112, 14, 14]          57,344
         BatchNorm2d-120          [-1, 112, 14, 14]             224
         BasicConv2d-121          [-1, 112, 14, 14]               0
              Conv2d-122          [-1, 144, 14, 14]          73,728
         BatchNorm2d-123          [-1, 144, 14, 14]             288
         BasicConv2d-124          [-1, 144, 14, 14]               0
              Conv2d-125          [-1, 288, 14, 14]         373,248
         BatchNorm2d-126          [-1, 288, 14, 14]             576
         BasicConv2d-127          [-1, 288, 14, 14]               0
              Conv2d-128           [-1, 32, 14, 14]          16,384
         BatchNorm2d-129           [-1, 32, 14, 14]              64
         BasicConv2d-130           [-1, 32, 14, 14]               0
              Conv2d-131           [-1, 64, 14, 14]          18,432
         BatchNorm2d-132           [-1, 64, 14, 14]             128
         BasicConv2d-133           [-1, 64, 14, 14]               0
           MaxPool2d-134          [-1, 512, 14, 14]               0
              Conv2d-135           [-1, 64, 14, 14]          32,768
         BatchNorm2d-136           [-1, 64, 14, 14]             128
         BasicConv2d-137           [-1, 64, 14, 14]               0
           Inception-138          [-1, 528, 14, 14]               0
              Conv2d-139            [-1, 128, 4, 4]          67,584
         BatchNorm2d-140            [-1, 128, 4, 4]             256
         BasicConv2d-141            [-1, 128, 4, 4]               0
              Linear-142                 [-1, 1024]       2,098,176
              Linear-143                 [-1, 1000]       1,025,000
        InceptionAux-144                 [-1, 1000]               0
              Conv2d-145          [-1, 256, 14, 14]         135,168
         BatchNorm2d-146          [-1, 256, 14, 14]             512
         BasicConv2d-147          [-1, 256, 14, 14]               0
              Conv2d-148          [-1, 160, 14, 14]          84,480
         BatchNorm2d-149          [-1, 160, 14, 14]             320
         BasicConv2d-150          [-1, 160, 14, 14]               0
              Conv2d-151          [-1, 320, 14, 14]         460,800
         BatchNorm2d-152          [-1, 320, 14, 14]             640
         BasicConv2d-153          [-1, 320, 14, 14]               0
              Conv2d-154           [-1, 32, 14, 14]          16,896
         BatchNorm2d-155           [-1, 32, 14, 14]              64
         BasicConv2d-156           [-1, 32, 14, 14]               0
              Conv2d-157          [-1, 128, 14, 14]          36,864
         BatchNorm2d-158          [-1, 128, 14, 14]             256
         BasicConv2d-159          [-1, 128, 14, 14]               0
           MaxPool2d-160          [-1, 528, 14, 14]               0
              Conv2d-161          [-1, 128, 14, 14]          67,584
         BatchNorm2d-162          [-1, 128, 14, 14]             256
         BasicConv2d-163          [-1, 128, 14, 14]               0
           Inception-164          [-1, 832, 14, 14]               0
           MaxPool2d-165            [-1, 832, 7, 7]               0
              Conv2d-166            [-1, 256, 7, 7]         212,992
         BatchNorm2d-167            [-1, 256, 7, 7]             512
         BasicConv2d-168            [-1, 256, 7, 7]               0
              Conv2d-169            [-1, 160, 7, 7]         133,120
         BatchNorm2d-170            [-1, 160, 7, 7]             320
         BasicConv2d-171            [-1, 160, 7, 7]               0
              Conv2d-172            [-1, 320, 7, 7]         460,800
         BatchNorm2d-173            [-1, 320, 7, 7]             640
         BasicConv2d-174            [-1, 320, 7, 7]               0
              Conv2d-175             [-1, 32, 7, 7]          26,624
         BatchNorm2d-176             [-1, 32, 7, 7]              64
         BasicConv2d-177             [-1, 32, 7, 7]               0
              Conv2d-178            [-1, 128, 7, 7]          36,864
         BatchNorm2d-179            [-1, 128, 7, 7]             256
         BasicConv2d-180            [-1, 128, 7, 7]               0
           MaxPool2d-181            [-1, 832, 7, 7]               0
              Conv2d-182            [-1, 128, 7, 7]         106,496
         BatchNorm2d-183            [-1, 128, 7, 7]             256
         BasicConv2d-184            [-1, 128, 7, 7]               0
           Inception-185            [-1, 832, 7, 7]               0
              Conv2d-186            [-1, 384, 7, 7]         319,488
         BatchNorm2d-187            [-1, 384, 7, 7]             768
         BasicConv2d-188            [-1, 384, 7, 7]               0
              Conv2d-189            [-1, 192, 7, 7]         159,744
         BatchNorm2d-190            [-1, 192, 7, 7]             384
         BasicConv2d-191            [-1, 192, 7, 7]               0
              Conv2d-192            [-1, 384, 7, 7]         663,552
         BatchNorm2d-193            [-1, 384, 7, 7]             768
         BasicConv2d-194            [-1, 384, 7, 7]               0
              Conv2d-195             [-1, 48, 7, 7]          39,936
         BatchNorm2d-196             [-1, 48, 7, 7]              96
         BasicConv2d-197             [-1, 48, 7, 7]               0
              Conv2d-198            [-1, 128, 7, 7]          55,296
         BatchNorm2d-199            [-1, 128, 7, 7]             256
         BasicConv2d-200            [-1, 128, 7, 7]               0
           MaxPool2d-201            [-1, 832, 7, 7]               0
              Conv2d-202            [-1, 128, 7, 7]         106,496
         BatchNorm2d-203            [-1, 128, 7, 7]             256
         BasicConv2d-204            [-1, 128, 7, 7]               0
           Inception-205           [-1, 1024, 7, 7]               0
    AdaptiveAvgPool2d-206           [-1, 1024, 1, 1]               0
             Dropout-207                 [-1, 1024]               0
              Linear-208                 [-1, 1000]       1,025,000
    ================================================================
    Total params: 13,004,888
    Trainable params: 13,004,888
    Non-trainable params: 0
    ----------------------------------------------------------------
    Input size (MB): 0.57
    Forward/backward pass size (MB): 94.25
    Params size (MB): 49.61
    Estimated Total Size (MB): 144.43
    ----------------------------------------------------------------
    None
    

    手工实现GoogLeNet网络结构

    GoogLeNet施工参数细节

    • 下面略掉了ReLU与BatchNorm2d,这两个操作对数据大小与格式没有影响。
    1. 输入图像

      • \color{red}{3 \times 224 \times 224}
      • 要求图像去中心化处理(均值化为0处理)
      • 格式:NCHW
    2. 卷积层-1

      • 输入: \color{red}{3 \times 224 \times 224}
      • 卷积核:kernel=(7 \times 7), stride = 2, channels = 64, padding=3
      • 输出: \color{red}{64 \times 112 \times 112}
      • 最大池化核:kernel=(3 \times 3), stride = 2
      • 输出:\color{red}{64 \times 56 \times 56}
    3. 卷积层-2

      • 输入:\color{red}{64 \times 56 \times 56}
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
      • 输出:\color{red}{64 \times 56 \times 56}
      • 卷积核:kernel=(3 \times 3), stride = 1, channels = 192, padding=1
      • 输出:\color{red}{192 \times 56 \times 56}
      • 最大池化核:kernel=(3 \times 3), stride = 2
      • 输出:\color{red}{192 \times 28 \times 28}
    4. Inception(3a)

      • 输入:\color{red}{192 \times 28 \times 28}
      • 分支1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
        • 输出:\color{red}{64 \times 28 \times 28}
      • 分支2
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 96, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 128, padding=1
        • 输出:\color{red}{128 \times 28 \times 28}
      • 分支3
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 16, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 32, padding=1
        • 输出:\color{red}{32 \times 28 \times 28}
      • 分支4
        • 池化核:kernel=(3 \times 3), stride = 1, padding=1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
        • 输出:\color{red}{32 \times 28 \times 28}
      • 输出:\color{red}{256 \times 28 \times 28}
        • 256 = 64 + 128 + 32 + 32
    5. Inception(3b)

      • 输入:\color{red}{256 \times 28 \times 28}
      • 分支1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
        • 输出:\color{red}{128 \times 28 \times 28}
      • 分支2
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 192, padding=1
        • 输出:\color{red}{192 \times 28 \times 28}
      • 分支3
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 96, padding=1
        • 输出:\color{red}{96 \times 28 \times 28}
      • 分支4
        • 池化核:kernel=(3 \times 3), stride = 1, padding=1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
        • 输出:\color{red}{64 \times 28 \times 28}
      • 输出:\color{red}{480 \times 28 \times 28}
        • 480 = 128 + 192 + 96 + 64
      • 最大池化核:kernel=(3 \times 3), stride = 2, padding=0
      • 输出:\color{red}{480 \times 14 \times 14}
    6. Inception(4a)

      • 输入:\color{red}{480 \times 14 \times 14}
      • 分支1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 192, padding=0
        • 输出:\color{red}{192 \times 14 \times 14}
      • 分支2
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 96, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 208, padding=1
        • 输出:\color{red}{208 \times 14 \times 14}
      • 分支3
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 16, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 48, padding=1
        • 输出:\color{red}{48 \times 14 \times 14}
      • 分支4
        • 池化核:kernel=(3 \times 3), stride = 1, padding=1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
        • 输出:\color{red}{64 \times 14 \times 14}
      • 输出:\color{red}{512 \times 14 \times 14}
        • 512 = 192 + 208 + 48 + 64
    7. 辅助分类器-1

      • 输入:\color{red}{512 \times 14 \times 14}
      • AdaptiveAvgPool2d:output_size=(4 \times 4)
      • 输出:\color{red}{512 \times 4 \times 4}
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
      • 输出:\color{red}{128 \times 4 \times 4 = 2048}
      • 全连接:2048 \to 1024
      • 全连接:1024 \to nn是分类的类别;
      • 输出:\color{red}{n}
    8. Inception(4b)

      • 输入:\color{red}{512 \times 14 \times 14}
      • 分支1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 160, padding=0
        • 输出:\color{red}{160 \times 14 \times 14}
      • 分支2
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 112, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 224, padding=1
        • 输出:\color{red}{224 \times 14 \times 14}
      • 分支3
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 24, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 64, padding=1
        • 输出:\color{red}{64 \times 14 \times 14}
      • 分支4
        • 池化核:kernel=(3 \times 3), stride = 1, padding=1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
        • 输出:\color{red}{64 \times 14 \times 14}
      • 输出:\color{red}{512 \times 14 \times 14}
        • 512 = 160 + 224 + 64 + 64
    9. Inception(4c)

      • 输入:\color{red}{512 \times 14 \times 14}
      • 分支1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
        • 输出:\color{red}{128 \times 14 \times 14}
      • 分支2
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 256, padding=1
        • 输出:\color{red}{256 \times 14 \times 14}
      • 分支3
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 24, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 64, padding=1
        • 输出:\color{red}{64 \times 14 \times 14}
      • 分支4
        • 池化核:kernel=(3 \times 3), stride = 1, padding=1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
        • 输出:\color{red}{64 \times 14 \times 14}
      • 输出:\color{red}{512 \times 14 \times 14}
        • 512 = 128 + 256 + 64 + 64
    10. Inception(4d)

      • 输入:\color{red}{512 \times 14 \times 14}
      • 分支1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 112, padding=0
        • 输出:\color{red}{112 \times 14 \times 14}
      • 分支2
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 144, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 288, padding=1
        • 输出:\color{red}{288 \times 14 \times 14}
      • 分支3
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 64, padding=1
        • 输出:\color{red}{64 \times 14 \times 14}
      • 分支4
        • 池化核:kernel=(3 \times 3), stride = 1, padding=1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 64, padding=0
        • 输出:\color{red}{64 \times 14 \times 14}
      • 输出:\color{red}{528 \times 14 \times 14}
        • 528 = 112 + 288 + 64 + 64
    11. 辅助分类器-2

      • 输入:\color{red}{528 \times 14 \times 14}
      • AdaptiveAvgPool2d:output_size=(4 \times 4)
      • 输出:\color{red}{512 \times 4 \times 4}
      • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
      • 输出:\color{red}{128 \times 4 \times 4 = 2048}
      • 全连接:2048 \to 1024
      • 全连接:1024 \to nn是分类的类别;
      • 输出:\color{red}{n}
    12. Inception(4e)

      • 输入:\color{red}{528 \times 14 \times 14}
      • 分支1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 256, padding=0
        • 输出:\color{red}{256 \times 14 \times 14}
      • 分支2
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 160, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 320, padding=1
        • 输出:\color{red}{320 \times 14 \times 14}
      • 分支3
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 128, padding=1
        • 输出:\color{red}{128 \times 14 \times 14}
      • 分支4
        • 池化核:kernel=(3 \times 3), stride = 1, padding=1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
        • 输出:\color{red}{128 \times 14 \times 14}
      • 输出:\color{red}{832 \times 14 \times 14}
        • 832 = 256 + 320 + 128 + 128
      • 最大池化核:kernel=(2 \times 2), stride = 2, padding=0
      • 输出:\color{red}{832 \times 7 \times 7}
    13. Inception(5a)

      • 输入:\color{red}{832 \times 7 \times 7}
      • 分支1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 256, padding=0
        • 输出:\color{red}{256 \times 7 \times 7}
      • 分支2
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 160, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 320, padding=1
        • 输出:\color{red}{320 \times 7 \times 7}
      • 分支3
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 32, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 128, padding=1
        • 输出:\color{red}{128 \times 7 \times 7}
      • 分支4
        • 池化核:kernel=(3 \times 3), stride = 1, padding=1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
        • 输出:\color{red}{128 \times 7 \times 7}
      • 输出:\color{red}{832 \times 7 \times 7}
        • 832 = 256 + 320 + 128 + 128
    1. Inception(5b)

      • 输入:\color{red}{832 \times 7 \times 7}
      • 分支1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 384, padding=0
        • 输出:\color{red}{256 \times 7 \times 7}
      • 分支2
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 192, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 384, padding=1
        • 输出:\color{red}{320 \times 7 \times 7}
      • 分支3
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 48, padding=0
        • 卷积核:kernel=(3 \times 3), stride = 1, channels = 128, padding=1
        • 输出:\color{red}{128 \times 7 \times 7}
      • 分支4
        • 池化核:kernel=(3 \times 3), stride = 1, padding=1
        • 卷积核:kernel=(1 \times 1), stride = 1, channels = 128, padding=0
        • 输出:\color{red}{128 \times 7 \times 7}
      • 输出:\color{red}{1024 \times 7 \times 7}
        • 1024 = 384 + 384 + 128 + 128
      • AdaptiveAvgPool2d:
      • 输出:\color{red}{1024 \times 1 \times 1}
    2. 全连接层

      • 1024 \to n
        • n表示类别数

    基本的卷积封装

    • Inception V1的增强核心是BatchNorm2d,每个卷积运算都会包含如下两个运算,所以做一个封装
      1. BatchNorm2d
      2. relu
    import torch
    from torch.nn import Conv2d, BatchNorm2d, Module, ReLU
    
    class YQConv2d(Module):
        
        # 构造器,初始化Conv2d, BatchNorm2d与ReLU
        def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
            super(YQConv2d, self).__init__()
            # 卷积层
            self.conv = Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False)
            # BatchNorm2d
            self.bn = BatchNorm2d(out_channels, eps=0.001)
            # 激活
            self.relu = ReLU(inplace=True)
    
        def forward(self, x):
            y_ = self.conv(x)
            y_ = self.bn(y_)
            y_ = self.relu(y_)
            return y_
    
    
    • 卷积的封装的结构
    from torchsummary import summary
    conv = YQConv2d(3, 64)
    # 输出网络结构
    print(summary(conv,input_size=(3, 224, 224), device='cpu'))
    
    ----------------------------------------------------------------
            Layer (type)               Output Shape         Param #
    ================================================================
                Conv2d-1         [-1, 64, 224, 224]             192
           BatchNorm2d-2         [-1, 64, 224, 224]             128
                  ReLU-3         [-1, 64, 224, 224]               0
    ================================================================
    Total params: 320
    Trainable params: 320
    Non-trainable params: 0
    ----------------------------------------------------------------
    Input size (MB): 0.57
    Forward/backward pass size (MB): 73.50
    Params size (MB): 0.00
    Estimated Total Size (MB): 74.08
    ----------------------------------------------------------------
    None
    

    Inception封装

    • 使用上面的卷积作为核心单元,封装Inception
    Inception结构
    import torch
    from torch.nn import Module, Sequential, MaxPool2d
    class YQInception(Module):
        # 构造器设置4个分支的参数(输出的通道数)
        def __init__(self, in_channels, ch1x1, ch3x3_1, ch3x3_2, ch5x5_1, ch5x5_2, ch_pool):
            super(YQInception, self).__init__()
            # YQConv2d(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
            # 分支-1
            self.branch_1 = Sequential(
                YQConv2d(in_channels=in_channels, out_channels=ch1x1, kernel_size=1, stride=1, padding=0)
            )
            # 分支-2
            self.branch_2 = Sequential(
                YQConv2d(in_channels=in_channels, out_channels=ch3x3_1, kernel_size=1, stride=1, padding=0),
                YQConv2d(in_channels=ch3x3_1,     out_channels=ch3x3_2, kernel_size=3, stride=1, padding=1)
            )
            # 分支-3
            self.branch_3 = Sequential(
                YQConv2d(in_channels=in_channels, out_channels=ch5x5_1, kernel_size=1, stride=1, padding=0),
                YQConv2d(in_channels=ch5x5_1,     out_channels=ch5x5_2, kernel_size=3, stride=1, padding=1)
            )
            # 分支-4
            self.branch_4 = Sequential(
                MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
                YQConv2d(in_channels=in_channels, out_channels=ch_pool, kernel_size=1, stride=1, padding=0)
            )
        
        def forward(self, x):
            b_y1 = self.branch_1(x)
            b_y2 = self.branch_2(x)
            b_y3 = self.branch_3(x)
            b_y4 = self.branch_4(x)
            
            y_ = torch.cat([b_y1, b_y2, b_y3, b_y4],  1)   # 1 表示按照列链接,就是行堆叠在一起。
            return y_
    
    • 输出下网络结构
    from torchsummary import summary
    conv = YQInception(3, 64, 96, 128, 16, 32, 32)   # 最后输出的深度是64 + 128 + 32 + 32
    # 输出网络结构
    print(summary(conv,input_size=(3, 224, 224), device='cpu'))
    
    ----------------------------------------------------------------
            Layer (type)               Output Shape         Param #
    ================================================================
                Conv2d-1         [-1, 64, 224, 224]             192
           BatchNorm2d-2         [-1, 64, 224, 224]             128
                  ReLU-3         [-1, 64, 224, 224]               0
              YQConv2d-4         [-1, 64, 224, 224]               0
                Conv2d-5         [-1, 96, 224, 224]             288
           BatchNorm2d-6         [-1, 96, 224, 224]             192
                  ReLU-7         [-1, 96, 224, 224]               0
              YQConv2d-8         [-1, 96, 224, 224]               0
                Conv2d-9        [-1, 128, 224, 224]         110,592
          BatchNorm2d-10        [-1, 128, 224, 224]             256
                 ReLU-11        [-1, 128, 224, 224]               0
             YQConv2d-12        [-1, 128, 224, 224]               0
               Conv2d-13         [-1, 16, 224, 224]              48
          BatchNorm2d-14         [-1, 16, 224, 224]              32
                 ReLU-15         [-1, 16, 224, 224]               0
             YQConv2d-16         [-1, 16, 224, 224]               0
               Conv2d-17         [-1, 32, 224, 224]           4,608
          BatchNorm2d-18         [-1, 32, 224, 224]              64
                 ReLU-19         [-1, 32, 224, 224]               0
             YQConv2d-20         [-1, 32, 224, 224]               0
            MaxPool2d-21          [-1, 3, 224, 224]               0
               Conv2d-22         [-1, 32, 224, 224]              96
          BatchNorm2d-23         [-1, 32, 224, 224]              64
                 ReLU-24         [-1, 32, 224, 224]               0
             YQConv2d-25         [-1, 32, 224, 224]               0
    ================================================================
    Total params: 116,560
    Trainable params: 116,560
    Non-trainable params: 0
    ----------------------------------------------------------------
    Input size (MB): 0.57
    Forward/backward pass size (MB): 564.65
    Params size (MB): 0.44
    Estimated Total Size (MB): 565.67
    ----------------------------------------------------------------
    None
    

    辅助(auxiliary)分类器封装

    • 辅助分类器一共可训练层是3层.
    辅助分类器
    import torch
    from torch.nn import Module, AdaptiveAvgPool2d, Linear, ReLU, Dropout
    
    class YQAuxClassifier(Module):
        def __init__(self, in_channels, num_classes):
            super(YQAuxClassifier, self).__init__()
            # 池化/卷积/全连接/全连接
            self.pool = AdaptiveAvgPool2d((4, 4))
            self.conv = YQConv2d(in_channels=in_channels, out_channels=128, kernel_size=1, stride=1, padding=0)
            self.fc_1 = Linear(2048, 1024)
            self.relu = ReLU(inplace=True)
            self.drop = Dropout(p=0.7, inplace=False)
            self.fc_2 = Linear(1024, num_classes)
        
        def forward(self, x):
            y_ = self.pool(x)
            y_ = self.conv(y_)
            y_ = torch.flatten(y_, 1)    # 卷积到连接层的数据维度转换。
            y_ = self.fc_1(y_)
            y_ = self.relu(y_)
            y_ = self.drop(y_)
            y_ = self.fc_2(y_)
            return y_
    
    • 辅助分类器可视化
    from torchsummary import summary
    classifier = YQAuxClassifier(512, 10)   # 最后输出的深度是64 + 128 + 32 + 32
    # 输出网络结构
    print(summary(classifier, input_size=(512, 14, 14), device='cpu'))
    
    ----------------------------------------------------------------
            Layer (type)               Output Shape         Param #
    ================================================================
     AdaptiveAvgPool2d-1            [-1, 512, 4, 4]               0
                Conv2d-2            [-1, 128, 4, 4]          65,536
           BatchNorm2d-3            [-1, 128, 4, 4]             256
                  ReLU-4            [-1, 128, 4, 4]               0
              YQConv2d-5            [-1, 128, 4, 4]               0
                Linear-6                 [-1, 1024]       2,098,176
                  ReLU-7                 [-1, 1024]               0
               Dropout-8                 [-1, 1024]               0
                Linear-9                   [-1, 10]          10,250
    ================================================================
    Total params: 2,174,218
    Trainable params: 2,174,218
    Non-trainable params: 0
    ----------------------------------------------------------------
    Input size (MB): 0.38
    Forward/backward pass size (MB): 0.15
    Params size (MB): 8.29
    Estimated Total Size (MB): 8.83
    ----------------------------------------------------------------
    None
    

    GoogLeNet实现

    • 按照下面几个阶段实现:
      1. 前面几个卷积
      2. Inception(包含两个辅助分类器)
      3. 分类器
    import torch
    from torch.nn import Module, AdaptiveAvgPool2d, Linear, Dropout, MaxPool2d
    class YQGoogLeNet(Module):
        def __init__(self, num_classes=1000):
            super(YQGoogLeNet, self).__init__()
            # 定义层
            self.conv_1  = YQConv2d(in_channels=3,  out_channels=64,  kernel_size=7, stride=2, padding=3)
            self.pool_1  = MaxPool2d(3, stride=2, ceil_mode=True)
            
            self.conv_2  = YQConv2d(in_channels=64, out_channels=64,  kernel_size=1, stride=1, padding=0)
            
            self.conv_3  = YQConv2d(in_channels=64, out_channels=192, kernel_size=3, stride=1, padding=1)
            self.pool_2  = MaxPool2d(3, stride=2, ceil_mode=True)
            
            self.ince_3a = YQInception(192, 64, 96, 128, 16, 32, 32)
            self.ince_3b = YQInception(256, 128, 128, 192, 32, 96, 64)
            self.pool_3  = MaxPool2d(3, stride=2, ceil_mode=True)
            
            self.ince_4a = YQInception(480, 192, 96, 208, 16, 48, 64)
            
            self.ince_4b = YQInception(512, 160, 112, 224, 24, 64, 64)
            self.ince_4c = YQInception(512, 128, 128, 256, 24, 64, 64)
            self.ince_4d = YQInception(512, 112, 144, 288, 32, 64, 64)
            self.ince_4e = YQInception(528, 256, 160, 320, 32, 128, 128)
            self.pool_4  = MaxPool2d(2, stride=2, ceil_mode=True)
            
            self.ince_5a = YQInception(832, 256, 160, 320, 32, 128, 128)
            self.ince_5b = YQInception(832, 384, 192, 384, 48, 128, 128)
            
            # 两个辅助分类器
            self.auxi_1  = YQAuxClassifier(512, num_classes)
            self.auxi_2  = YQAuxClassifier(528, num_classes)
            
            # 结尾的分类层
            self.pool_5  = AdaptiveAvgPool2d((1, 1))
            self.drop = Dropout(0.2)
            self.full = Linear(1024, num_classes)
            
        def forward(self, x):
            # -----------------------------
            y_ = self.conv_1(x)
            y_ = self.pool_1(y_)
            # -----------------------------
            y_ = self.conv_2(y_)
            y_ = self.conv_3(y_)
            y_ = self.pool_2(y_)
            # -----------------------------
            y_ = self.ince_3a(y_)
            y_ = self.ince_3b(y_)
            y_ = self.pool_3(y_)
            # -----------------------------
            y_ = self.ince_4a(y_)
            # -----------------------------
            a1 = self.auxi_1(y_)
            # -----------------------------
            y_ = self.ince_4b(y_)
            y_ = self.ince_4c(y_)
            y_ = self.ince_4d(y_)
            # -----------------------------
            a2 = self.auxi_2(y_)
            # -----------------------------
            y_ = self.ince_4e(y_)
            y_ = self.pool_4(y_)
            # -----------------------------
            y_ = self.ince_5a(y_)
            y_ = self.ince_5b(y_)
            # -----------------------------
            y_ = self.pool_5(y_)
            y_ = torch.flatten(y_, 1)
            y_ = self.drop(y_)
            y_ = self.full(y_)
            
            return y_, a1, a2
    
    • GoogLeNet网络结构
    from torchsummary import summary
    net = YQGoogLeNet(1000) 
    # 输出网络结构
    print(summary(net,input_size=(3, 244, 244), device='cpu'))
    
    ----------------------------------------------------------------
            Layer (type)               Output Shape         Param #
    ================================================================
                Conv2d-1         [-1, 64, 122, 122]           9,408
           BatchNorm2d-2         [-1, 64, 122, 122]             128
                  ReLU-3         [-1, 64, 122, 122]               0
              YQConv2d-4         [-1, 64, 122, 122]               0
             MaxPool2d-5           [-1, 64, 61, 61]               0
                Conv2d-6           [-1, 64, 61, 61]           4,096
           BatchNorm2d-7           [-1, 64, 61, 61]             128
                  ReLU-8           [-1, 64, 61, 61]               0
              YQConv2d-9           [-1, 64, 61, 61]               0
               Conv2d-10          [-1, 192, 61, 61]         110,592
          BatchNorm2d-11          [-1, 192, 61, 61]             384
                 ReLU-12          [-1, 192, 61, 61]               0
             YQConv2d-13          [-1, 192, 61, 61]               0
            MaxPool2d-14          [-1, 192, 30, 30]               0
               Conv2d-15           [-1, 64, 30, 30]          12,288
          BatchNorm2d-16           [-1, 64, 30, 30]             128
                 ReLU-17           [-1, 64, 30, 30]               0
             YQConv2d-18           [-1, 64, 30, 30]               0
               Conv2d-19           [-1, 96, 30, 30]          18,432
          BatchNorm2d-20           [-1, 96, 30, 30]             192
                 ReLU-21           [-1, 96, 30, 30]               0
             YQConv2d-22           [-1, 96, 30, 30]               0
               Conv2d-23          [-1, 128, 30, 30]         110,592
          BatchNorm2d-24          [-1, 128, 30, 30]             256
                 ReLU-25          [-1, 128, 30, 30]               0
             YQConv2d-26          [-1, 128, 30, 30]               0
               Conv2d-27           [-1, 16, 30, 30]           3,072
          BatchNorm2d-28           [-1, 16, 30, 30]              32
                 ReLU-29           [-1, 16, 30, 30]               0
             YQConv2d-30           [-1, 16, 30, 30]               0
               Conv2d-31           [-1, 32, 30, 30]           4,608
          BatchNorm2d-32           [-1, 32, 30, 30]              64
                 ReLU-33           [-1, 32, 30, 30]               0
             YQConv2d-34           [-1, 32, 30, 30]               0
            MaxPool2d-35          [-1, 192, 30, 30]               0
               Conv2d-36           [-1, 32, 30, 30]           6,144
          BatchNorm2d-37           [-1, 32, 30, 30]              64
                 ReLU-38           [-1, 32, 30, 30]               0
             YQConv2d-39           [-1, 32, 30, 30]               0
          YQInception-40          [-1, 256, 30, 30]               0
               Conv2d-41          [-1, 128, 30, 30]          32,768
          BatchNorm2d-42          [-1, 128, 30, 30]             256
                 ReLU-43          [-1, 128, 30, 30]               0
             YQConv2d-44          [-1, 128, 30, 30]               0
               Conv2d-45          [-1, 128, 30, 30]          32,768
          BatchNorm2d-46          [-1, 128, 30, 30]             256
                 ReLU-47          [-1, 128, 30, 30]               0
             YQConv2d-48          [-1, 128, 30, 30]               0
               Conv2d-49          [-1, 192, 30, 30]         221,184
          BatchNorm2d-50          [-1, 192, 30, 30]             384
                 ReLU-51          [-1, 192, 30, 30]               0
             YQConv2d-52          [-1, 192, 30, 30]               0
               Conv2d-53           [-1, 32, 30, 30]           8,192
          BatchNorm2d-54           [-1, 32, 30, 30]              64
                 ReLU-55           [-1, 32, 30, 30]               0
             YQConv2d-56           [-1, 32, 30, 30]               0
               Conv2d-57           [-1, 96, 30, 30]          27,648
          BatchNorm2d-58           [-1, 96, 30, 30]             192
                 ReLU-59           [-1, 96, 30, 30]               0
             YQConv2d-60           [-1, 96, 30, 30]               0
            MaxPool2d-61          [-1, 256, 30, 30]               0
               Conv2d-62           [-1, 64, 30, 30]          16,384
          BatchNorm2d-63           [-1, 64, 30, 30]             128
                 ReLU-64           [-1, 64, 30, 30]               0
             YQConv2d-65           [-1, 64, 30, 30]               0
          YQInception-66          [-1, 480, 30, 30]               0
            MaxPool2d-67          [-1, 480, 15, 15]               0
               Conv2d-68          [-1, 192, 15, 15]          92,160
          BatchNorm2d-69          [-1, 192, 15, 15]             384
                 ReLU-70          [-1, 192, 15, 15]               0
             YQConv2d-71          [-1, 192, 15, 15]               0
               Conv2d-72           [-1, 96, 15, 15]          46,080
          BatchNorm2d-73           [-1, 96, 15, 15]             192
                 ReLU-74           [-1, 96, 15, 15]               0
             YQConv2d-75           [-1, 96, 15, 15]               0
               Conv2d-76          [-1, 208, 15, 15]         179,712
          BatchNorm2d-77          [-1, 208, 15, 15]             416
                 ReLU-78          [-1, 208, 15, 15]               0
             YQConv2d-79          [-1, 208, 15, 15]               0
               Conv2d-80           [-1, 16, 15, 15]           7,680
          BatchNorm2d-81           [-1, 16, 15, 15]              32
                 ReLU-82           [-1, 16, 15, 15]               0
             YQConv2d-83           [-1, 16, 15, 15]               0
               Conv2d-84           [-1, 48, 15, 15]           6,912
          BatchNorm2d-85           [-1, 48, 15, 15]              96
                 ReLU-86           [-1, 48, 15, 15]               0
             YQConv2d-87           [-1, 48, 15, 15]               0
            MaxPool2d-88          [-1, 480, 15, 15]               0
               Conv2d-89           [-1, 64, 15, 15]          30,720
          BatchNorm2d-90           [-1, 64, 15, 15]             128
                 ReLU-91           [-1, 64, 15, 15]               0
             YQConv2d-92           [-1, 64, 15, 15]               0
          YQInception-93          [-1, 512, 15, 15]               0
    AdaptiveAvgPool2d-94            [-1, 512, 4, 4]               0
               Conv2d-95            [-1, 128, 4, 4]          65,536
          BatchNorm2d-96            [-1, 128, 4, 4]             256
                 ReLU-97            [-1, 128, 4, 4]               0
             YQConv2d-98            [-1, 128, 4, 4]               0
               Linear-99                 [-1, 1024]       2,098,176
                ReLU-100                 [-1, 1024]               0
             Dropout-101                 [-1, 1024]               0
              Linear-102                 [-1, 1000]       1,025,000
     YQAuxClassifier-103                 [-1, 1000]               0
              Conv2d-104          [-1, 160, 15, 15]          81,920
         BatchNorm2d-105          [-1, 160, 15, 15]             320
                ReLU-106          [-1, 160, 15, 15]               0
            YQConv2d-107          [-1, 160, 15, 15]               0
              Conv2d-108          [-1, 112, 15, 15]          57,344
         BatchNorm2d-109          [-1, 112, 15, 15]             224
                ReLU-110          [-1, 112, 15, 15]               0
            YQConv2d-111          [-1, 112, 15, 15]               0
              Conv2d-112          [-1, 224, 15, 15]         225,792
         BatchNorm2d-113          [-1, 224, 15, 15]             448
                ReLU-114          [-1, 224, 15, 15]               0
            YQConv2d-115          [-1, 224, 15, 15]               0
              Conv2d-116           [-1, 24, 15, 15]          12,288
         BatchNorm2d-117           [-1, 24, 15, 15]              48
                ReLU-118           [-1, 24, 15, 15]               0
            YQConv2d-119           [-1, 24, 15, 15]               0
              Conv2d-120           [-1, 64, 15, 15]          13,824
         BatchNorm2d-121           [-1, 64, 15, 15]             128
                ReLU-122           [-1, 64, 15, 15]               0
            YQConv2d-123           [-1, 64, 15, 15]               0
           MaxPool2d-124          [-1, 512, 15, 15]               0
              Conv2d-125           [-1, 64, 15, 15]          32,768
         BatchNorm2d-126           [-1, 64, 15, 15]             128
                ReLU-127           [-1, 64, 15, 15]               0
            YQConv2d-128           [-1, 64, 15, 15]               0
         YQInception-129          [-1, 512, 15, 15]               0
              Conv2d-130          [-1, 128, 15, 15]          65,536
         BatchNorm2d-131          [-1, 128, 15, 15]             256
                ReLU-132          [-1, 128, 15, 15]               0
            YQConv2d-133          [-1, 128, 15, 15]               0
              Conv2d-134          [-1, 128, 15, 15]          65,536
         BatchNorm2d-135          [-1, 128, 15, 15]             256
                ReLU-136          [-1, 128, 15, 15]               0
            YQConv2d-137          [-1, 128, 15, 15]               0
              Conv2d-138          [-1, 256, 15, 15]         294,912
         BatchNorm2d-139          [-1, 256, 15, 15]             512
                ReLU-140          [-1, 256, 15, 15]               0
            YQConv2d-141          [-1, 256, 15, 15]               0
              Conv2d-142           [-1, 24, 15, 15]          12,288
         BatchNorm2d-143           [-1, 24, 15, 15]              48
                ReLU-144           [-1, 24, 15, 15]               0
            YQConv2d-145           [-1, 24, 15, 15]               0
              Conv2d-146           [-1, 64, 15, 15]          13,824
         BatchNorm2d-147           [-1, 64, 15, 15]             128
                ReLU-148           [-1, 64, 15, 15]               0
            YQConv2d-149           [-1, 64, 15, 15]               0
           MaxPool2d-150          [-1, 512, 15, 15]               0
              Conv2d-151           [-1, 64, 15, 15]          32,768
         BatchNorm2d-152           [-1, 64, 15, 15]             128
                ReLU-153           [-1, 64, 15, 15]               0
            YQConv2d-154           [-1, 64, 15, 15]               0
         YQInception-155          [-1, 512, 15, 15]               0
              Conv2d-156          [-1, 112, 15, 15]          57,344
         BatchNorm2d-157          [-1, 112, 15, 15]             224
                ReLU-158          [-1, 112, 15, 15]               0
            YQConv2d-159          [-1, 112, 15, 15]               0
              Conv2d-160          [-1, 144, 15, 15]          73,728
         BatchNorm2d-161          [-1, 144, 15, 15]             288
                ReLU-162          [-1, 144, 15, 15]               0
            YQConv2d-163          [-1, 144, 15, 15]               0
              Conv2d-164          [-1, 288, 15, 15]         373,248
         BatchNorm2d-165          [-1, 288, 15, 15]             576
                ReLU-166          [-1, 288, 15, 15]               0
            YQConv2d-167          [-1, 288, 15, 15]               0
              Conv2d-168           [-1, 32, 15, 15]          16,384
         BatchNorm2d-169           [-1, 32, 15, 15]              64
                ReLU-170           [-1, 32, 15, 15]               0
            YQConv2d-171           [-1, 32, 15, 15]               0
              Conv2d-172           [-1, 64, 15, 15]          18,432
         BatchNorm2d-173           [-1, 64, 15, 15]             128
                ReLU-174           [-1, 64, 15, 15]               0
            YQConv2d-175           [-1, 64, 15, 15]               0
           MaxPool2d-176          [-1, 512, 15, 15]               0
              Conv2d-177           [-1, 64, 15, 15]          32,768
         BatchNorm2d-178           [-1, 64, 15, 15]             128
                ReLU-179           [-1, 64, 15, 15]               0
            YQConv2d-180           [-1, 64, 15, 15]               0
         YQInception-181          [-1, 528, 15, 15]               0
    AdaptiveAvgPool2d-182            [-1, 528, 4, 4]               0
              Conv2d-183            [-1, 128, 4, 4]          67,584
         BatchNorm2d-184            [-1, 128, 4, 4]             256
                ReLU-185            [-1, 128, 4, 4]               0
            YQConv2d-186            [-1, 128, 4, 4]               0
              Linear-187                 [-1, 1024]       2,098,176
                ReLU-188                 [-1, 1024]               0
             Dropout-189                 [-1, 1024]               0
              Linear-190                 [-1, 1000]       1,025,000
     YQAuxClassifier-191                 [-1, 1000]               0
              Conv2d-192          [-1, 256, 15, 15]         135,168
         BatchNorm2d-193          [-1, 256, 15, 15]             512
                ReLU-194          [-1, 256, 15, 15]               0
            YQConv2d-195          [-1, 256, 15, 15]               0
              Conv2d-196          [-1, 160, 15, 15]          84,480
         BatchNorm2d-197          [-1, 160, 15, 15]             320
                ReLU-198          [-1, 160, 15, 15]               0
            YQConv2d-199          [-1, 160, 15, 15]               0
              Conv2d-200          [-1, 320, 15, 15]         460,800
         BatchNorm2d-201          [-1, 320, 15, 15]             640
                ReLU-202          [-1, 320, 15, 15]               0
            YQConv2d-203          [-1, 320, 15, 15]               0
              Conv2d-204           [-1, 32, 15, 15]          16,896
         BatchNorm2d-205           [-1, 32, 15, 15]              64
                ReLU-206           [-1, 32, 15, 15]               0
            YQConv2d-207           [-1, 32, 15, 15]               0
              Conv2d-208          [-1, 128, 15, 15]          36,864
         BatchNorm2d-209          [-1, 128, 15, 15]             256
                ReLU-210          [-1, 128, 15, 15]               0
            YQConv2d-211          [-1, 128, 15, 15]               0
           MaxPool2d-212          [-1, 528, 15, 15]               0
              Conv2d-213          [-1, 128, 15, 15]          67,584
         BatchNorm2d-214          [-1, 128, 15, 15]             256
                ReLU-215          [-1, 128, 15, 15]               0
            YQConv2d-216          [-1, 128, 15, 15]               0
         YQInception-217          [-1, 832, 15, 15]               0
           MaxPool2d-218            [-1, 832, 8, 8]               0
              Conv2d-219            [-1, 256, 8, 8]         212,992
         BatchNorm2d-220            [-1, 256, 8, 8]             512
                ReLU-221            [-1, 256, 8, 8]               0
            YQConv2d-222            [-1, 256, 8, 8]               0
              Conv2d-223            [-1, 160, 8, 8]         133,120
         BatchNorm2d-224            [-1, 160, 8, 8]             320
                ReLU-225            [-1, 160, 8, 8]               0
            YQConv2d-226            [-1, 160, 8, 8]               0
              Conv2d-227            [-1, 320, 8, 8]         460,800
         BatchNorm2d-228            [-1, 320, 8, 8]             640
                ReLU-229            [-1, 320, 8, 8]               0
            YQConv2d-230            [-1, 320, 8, 8]               0
              Conv2d-231             [-1, 32, 8, 8]          26,624
         BatchNorm2d-232             [-1, 32, 8, 8]              64
                ReLU-233             [-1, 32, 8, 8]               0
            YQConv2d-234             [-1, 32, 8, 8]               0
              Conv2d-235            [-1, 128, 8, 8]          36,864
         BatchNorm2d-236            [-1, 128, 8, 8]             256
                ReLU-237            [-1, 128, 8, 8]               0
            YQConv2d-238            [-1, 128, 8, 8]               0
           MaxPool2d-239            [-1, 832, 8, 8]               0
              Conv2d-240            [-1, 128, 8, 8]         106,496
         BatchNorm2d-241            [-1, 128, 8, 8]             256
                ReLU-242            [-1, 128, 8, 8]               0
            YQConv2d-243            [-1, 128, 8, 8]               0
         YQInception-244            [-1, 832, 8, 8]               0
              Conv2d-245            [-1, 384, 8, 8]         319,488
         BatchNorm2d-246            [-1, 384, 8, 8]             768
                ReLU-247            [-1, 384, 8, 8]               0
            YQConv2d-248            [-1, 384, 8, 8]               0
              Conv2d-249            [-1, 192, 8, 8]         159,744
         BatchNorm2d-250            [-1, 192, 8, 8]             384
                ReLU-251            [-1, 192, 8, 8]               0
            YQConv2d-252            [-1, 192, 8, 8]               0
              Conv2d-253            [-1, 384, 8, 8]         663,552
         BatchNorm2d-254            [-1, 384, 8, 8]             768
                ReLU-255            [-1, 384, 8, 8]               0
            YQConv2d-256            [-1, 384, 8, 8]               0
              Conv2d-257             [-1, 48, 8, 8]          39,936
         BatchNorm2d-258             [-1, 48, 8, 8]              96
                ReLU-259             [-1, 48, 8, 8]               0
            YQConv2d-260             [-1, 48, 8, 8]               0
              Conv2d-261            [-1, 128, 8, 8]          55,296
         BatchNorm2d-262            [-1, 128, 8, 8]             256
                ReLU-263            [-1, 128, 8, 8]               0
            YQConv2d-264            [-1, 128, 8, 8]               0
           MaxPool2d-265            [-1, 832, 8, 8]               0
              Conv2d-266            [-1, 128, 8, 8]         106,496
         BatchNorm2d-267            [-1, 128, 8, 8]             256
                ReLU-268            [-1, 128, 8, 8]               0
            YQConv2d-269            [-1, 128, 8, 8]               0
         YQInception-270           [-1, 1024, 8, 8]               0
    AdaptiveAvgPool2d-271           [-1, 1024, 1, 1]               0
             Dropout-272                 [-1, 1024]               0
              Linear-273                 [-1, 1000]       1,025,000
    ================================================================
    Total params: 13,004,888
    Trainable params: 13,004,888
    Non-trainable params: 0
    ----------------------------------------------------------------
    Input size (MB): 0.68
    Forward/backward pass size (MB): 139.36
    Params size (MB): 49.61
    Estimated Total Size (MB): 189.65
    ----------------------------------------------------------------
    None
    
    • 我们使用1000个类别,输出的参数大小与官方的大小一致。

    • 上面模型有个不足的地方就是:

      • 辅助分类器只对训练有用,实际上对预测分类的时候是没有用处的,可以使用一个逻辑变量打开/关闭辅助分类器的计算。

    数据集加载与训练

    • 由于GoogLeNet有两个辅助分类器,鼓励在浅层分类,所以训练的方式与原来的单纯的分类训练方式有差别。

      • 在训练过程中,辅助分类器的损失会根据折扣后的权重(折扣权重为0.3)叠加到总损失中。
    • 为了代码清晰,下面代码都几种在一起。

    import torch
    from torch.nn import Conv2d, BatchNorm2d, Module, ReLU, Sequential, MaxPool2d, AdaptiveAvgPool2d, Linear, Dropout
    from torchvision.datasets import ImageFolder
    from torchvision.transforms import *
    from torchvision.transforms.functional import *
    from torch.utils.data import random_split
    from torch.utils.data import DataLoader
    import torch
    import torchvision
    import numpy as np
    import cv2
    
    # 卷积与BN,ReLU的合并封装,实际上使用Sequential也可以更加简洁的实现。
    # --------------------------------------------------------------
    class YQConv2d(Module):
        
        # 构造器,初始化Conv2d, BatchNorm2d与ReLU
        def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
            super(YQConv2d, self).__init__()
            # 卷积层
            self.conv = Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False)
            # BatchNorm2d
            self.bn = BatchNorm2d(out_channels, eps=0.001)
            # 激活
            self.relu = ReLU(inplace=True)
    
        def forward(self, x):
            y_ = self.conv(x)
            y_ = self.bn(y_)
            y_ = self.relu(y_)
            return y_
    
        
    # Inception V2实现
    # --------------------------------------------------------------
    class YQInception(Module):
        # 构造器设置4个分支的参数(输出的通道数)
        def __init__(self, in_channels, ch1x1, ch3x3_1, ch3x3_2, ch5x5_1, ch5x5_2, ch_pool):
            super(YQInception, self).__init__()
            # YQConv2d(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
            # 分支-1
            self.branch_1 = Sequential(
                YQConv2d(in_channels=in_channels, out_channels=ch1x1, kernel_size=1, stride=1, padding=0)
            )
            # 分支-2
            self.branch_2 = Sequential(
                YQConv2d(in_channels=in_channels, out_channels=ch3x3_1, kernel_size=1, stride=1, padding=0),
                YQConv2d(in_channels=ch3x3_1,     out_channels=ch3x3_2, kernel_size=3, stride=1, padding=1)
            )
            # 分支-3
            self.branch_3 = Sequential(
                YQConv2d(in_channels=in_channels, out_channels=ch5x5_1, kernel_size=1, stride=1, padding=0),
                YQConv2d(in_channels=ch5x5_1,     out_channels=ch5x5_2, kernel_size=3, stride=1, padding=1)
            )
            # 分支-4
            self.branch_4 = Sequential(
                MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
                YQConv2d(in_channels=in_channels, out_channels=ch_pool, kernel_size=1, stride=1, padding=0)
            )
        
        def forward(self, x):
            b_y1 = self.branch_1(x)
            b_y2 = self.branch_2(x)
            b_y3 = self.branch_3(x)
            b_y4 = self.branch_4(x)
            
            y_ = torch.cat([b_y1, b_y2, b_y3, b_y4],  1)   # 1 表示按照列链接,就是行堆叠在一起。
            return y_
        
    
    # 辅助分类器实现
    # --------------------------------------------------------------
    class YQAuxClassifier(Module):
        def __init__(self, in_channels, num_classes):
            super(YQAuxClassifier, self).__init__()
            # 池化/卷积/全连接/全连接
            self.pool = AdaptiveAvgPool2d((4, 4))
            self.conv = YQConv2d(in_channels=in_channels, out_channels=128, kernel_size=1, stride=1, padding=0)
            self.fc_1 = Linear(2048, 1024)
            self.relu = ReLU(inplace=True)
            self.drop = Dropout(p=0.7, inplace=False)
            self.fc_2 = Linear(1024, num_classes)
        
        def forward(self, x):
            y_ = self.pool(x)
            y_ = self.conv(y_)
            y_ = torch.flatten(y_, 1)    # 卷积到连接层的数据维度转换。
            y_ = self.fc_1(y_)
            y_ = self.relu(y_)
            y_ = self.drop(y_)
            y_ = self.fc_2(y_)
            return y_
    
    
    
    # GoogLeNet网络实现
    # --------------------------------------------------------------
    class YQGoogLeNet(Module):
        def __init__(self, num_classes=1000):
            super(YQGoogLeNet, self).__init__()
            # 定义层
            self.conv_1  = YQConv2d(in_channels=3,  out_channels=64,  kernel_size=7, stride=2, padding=3)
            self.pool_1  = MaxPool2d(3, stride=2, ceil_mode=True)
            
            self.conv_2  = YQConv2d(in_channels=64, out_channels=64,  kernel_size=1, stride=1, padding=0)
            
            self.conv_3  = YQConv2d(in_channels=64, out_channels=192, kernel_size=3, stride=1, padding=1)
            self.pool_2  = MaxPool2d(3, stride=2, ceil_mode=True)
            
            self.ince_3a = YQInception(192, 64, 96, 128, 16, 32, 32)
            self.ince_3b = YQInception(256, 128, 128, 192, 32, 96, 64)
            self.pool_3  = MaxPool2d(3, stride=2, ceil_mode=True)
            
            self.ince_4a = YQInception(480, 192, 96, 208, 16, 48, 64)
            
            self.ince_4b = YQInception(512, 160, 112, 224, 24, 64, 64)
            self.ince_4c = YQInception(512, 128, 128, 256, 24, 64, 64)
            self.ince_4d = YQInception(512, 112, 144, 288, 32, 64, 64)
            self.ince_4e = YQInception(528, 256, 160, 320, 32, 128, 128)
            self.pool_4  = MaxPool2d(2, stride=2, ceil_mode=True)
            
            self.ince_5a = YQInception(832, 256, 160, 320, 32, 128, 128)
            self.ince_5b = YQInception(832, 384, 192, 384, 48, 128, 128)
            
            # 两个辅助分类器
            self.auxi_1  = YQAuxClassifier(512, num_classes)
            self.auxi_2  = YQAuxClassifier(528, num_classes)
            
            # 结尾的分类层
            self.pool_5  = AdaptiveAvgPool2d((1, 1))
            self.drop = Dropout(0.2)
            self.full = Linear(1024, num_classes)
            
        def forward(self, x):
            # -----------------------------
            y_ = self.conv_1(x)
            y_ = self.pool_1(y_)
            # -----------------------------
            y_ = self.conv_2(y_)
            y_ = self.conv_3(y_)
            y_ = self.pool_2(y_)
            # -----------------------------
            y_ = self.ince_3a(y_)
            y_ = self.ince_3b(y_)
            y_ = self.pool_3(y_)
            # -----------------------------
            y_ = self.ince_4a(y_)
            # -----------------------------
            a1 = self.auxi_1(y_)
            # -----------------------------
            y_ = self.ince_4b(y_)
            y_ = self.ince_4c(y_)
            y_ = self.ince_4d(y_)
            # -----------------------------
            a2 = self.auxi_2(y_)
            # -----------------------------
            y_ = self.ince_4e(y_)
            y_ = self.pool_4(y_)
            # -----------------------------
            y_ = self.ince_5a(y_)
            y_ = self.ince_5b(y_)
            # -----------------------------
            y_ = self.pool_5(y_)
            y_ = torch.flatten(y_, 1)
            y_ = self.drop(y_)
            y_ = self.full(y_)
            
            return y_, a1, a2
    
    
    # 数据加载
    # --------------------------------------------------------------
    def load_data(img_dir, rate=0.8):
        transform = Compose(
            [
                Resize((224, 224)),          #RandomResizedCrop(224),
        #         RandomHorizontalFlip(),
                ToTensor(),
                Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0]),   # 均值与方差,(这个运算输入必须是Tensor图像,所以需要在ToTensor后)
            ]
        )
        ds = ImageFolder(img_dir, transform=transform)
    
        l = len(ds)
        l_train = int(l * rate)
        train, test = random_split(ds, [l_train, l - l_train])
        
        train_loader = torch.utils.data.DataLoader(dataset=train, shuffle=True, batch_size=50)   # 100,因为每个类的图像是1300个
        test_loader = torch.utils.data.DataLoader(dataset=test, shuffle=True, batch_size=50)  # 一个批次直接预测
    
        return train_loader, test_loader
    
    
    # 训练实现
    # ==============================================================
    # 1. 加载数据集
    print("1. 加载数据集")
    train_loader, test_loader = load_data("./imagenet2012", 0.8)
    
    CUDA = torch.cuda.is_available()
    # 2. 网络搭建
    print("2. 网络搭建")
    net=YQGoogLeNet(4)
    if CUDA:
        net.cuda()
    
    # 3. 训练
    print("3. 训练")
    optimizer=torch.optim.Adam(net.parameters(),lr=0.001)
    loss_F=torch.nn.CrossEntropyLoss()
    
    epoch = 10
    
    
    for n in range(epoch): # 数据集只迭代一次
        for step, input_data in enumerate(train_loader):
            x_, y_=input_data
            if CUDA:
                # GPU运算 -----------------------------------------------
                x_ = x_.cuda()
                y_ = y_.cuda()
            pred, pred_aux_1, pred_aux_2 =net(x_.view(-1, 3, 224, 224))  
            loss = loss_F(pred, y_) # 计算loss
            loss_aux_1 = loss_F(pred_aux_1, y_)
            loss_aux_2 = loss_F(pred_aux_2, y_) 
            
            last_loss = loss + 0.3 * (loss_aux_1 + loss_aux_2)   # 辅助分类器的误差按照30%的比例使用
            optimizer.zero_grad()
            last_loss.backward()
            optimizer.step()
            
            with torch.no_grad():
                all_num = 0.0 
                acc = 0.0
                for t_x, t_y in  test_loader:
                    all_num  += len(t_y)
                    if CUDA:
                        t_x = t_x.cuda()
                        t_y = t_y.cuda()
    
                    # 
                    test_pred, _, _=net(t_x.view(-1, 3, 224, 224))    # 辅助分类器的返回结果不需要
                    prob=torch.nn.functional.softmax(test_pred, dim=1)
                    pred_cls=torch.argmax(prob, dim=1)
                    acc += (pred_cls == t_y).float().sum()
                print(f"轮数/批次:{n:02d}/{step:02d}: \t识别正确率:{acc/all_num *100:6.4f}, 损失值:{last_loss:6.4f}")
            # print(f"轮数:{n+1:02d}: \t识别正确率:{acc/all_num *100:6.4f}, \t损失值:{loss:6.4f}")
    
    # 保存模型
    torch.save(net.state_dict(), "./googlenet.models")  # GPU保存
    
    
    1. 加载数据集
    2. 网络搭建
    3. 训练
    轮数/批次:00/00:    识别正确率:34.5566, 损失值:2.2265
    轮数/批次:00/01:    识别正确率:33.2314, 损失值:2.1856
    轮数/批次:00/02:    识别正确率:44.8522, 损失值:3.0660
    轮数/批次:00/03:    识别正确率:40.4689, 损失值:2.3589
    ......
    轮数/批次:09/75:    识别正确率:80.8359, 损失值:0.8001
    轮数/批次:09/76:    识别正确率:79.6126, 损失值:0.3436
    轮数/批次:09/77:    识别正确率:81.0397, 损失值:0.7516
    轮数/批次:09/78:    识别正确率:81.2436, 损失值:0.3100
    
    GoogLeNet等几个网络的测试分析

    Inception V2

    • 输入图像为:
      • 3 \times 299 \times 299

    卷积分解

    • 卷积分解设计

    • Inception V2使用小核卷积替代大核卷积,并减少计算量

      • 5 \times 5卷积使用2个3 \times 3的卷积替代。
      • V2早期与V1的区别是就是增加了BatchNorm2d,官方的说法是V1增强,后来发布了卷积分解,才正式称呼为V2
    • 卷积分解示意图

    卷积分解示意图

    深度分解

    • 深度分解设计

    • 深度分解的原因是传统的并联池化处理存在问题:(输出的通道数保持一个相等值。)

      1. 先卷积,再池化
        • 增加卷积的计算量。
      2. 先池化,再卷积
        • 会产生特征缺失的情况。
    卷积与池化的两种串联方式
    • 深度分解的设计如下:
      • 可以保持特征,同时降低计算量。
    卷积与池化串联的深度分解设计

    Inception V2的GoogLeNet网络结构

    • 下图实际上是也可以成为去掉BatchNorm2d的V3版本。实际上也是V3(包含不对成的卷积分解)
    Inception V2 GoogLeNet网络结构
    • 注:
      • 上表中的Figure 5指没有进化的Inception
      • 上表中的Figure 6是指小卷积版的Inception(用3x3卷积核代替5x5卷积核),
      • 上表中的Figure 7是指不对称版的Inception(用1xn、nx1卷积核代替nxn卷积核)。

    附录

    1. Figure-5
    Figure-5
    1. Figure-6
    Figure-6
    1. Figure-7
    Figure-7

    Inception V3

    • 不对称卷积分解设计

    • 引入了不对称卷积分解:n \times n的卷积,并分解成 n \times 11 \times nn=3, 5,具体的网络结构见V2中的网络结构。

    • 图像的输入提升为3 \times 299 \times 299

    卷积不对称分解

    Inception V4

    Inception残差设计

    • 使用Inception做残差,改进Inception-v3,得到如下残差方式
      1. Inception-ResNet-v1,
      2. Inception-ResNet-v2,
      3. Inception-v4
    • Inception残差设计
    Inception残差

    Inception V4 GoogLeNet网络结构

    Inception V4 GoogLeNet网络结构

    附录

    1. 关于Google发表的关于GoogLetNet的系列文章,在这些论文中对Inception v1、Inception v2、Inception v3、Inception v4 等思想和技术原理进行了详细的介绍:

      • 《Going deeper with convolutions》
      • 《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》
      • 《Rethinking the Inception Architecture for Computer Vision》
      • 《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》
    2. 官方的源代码还做了很多细节的处理,比如:

      • 结构更加良好的代码组织;
      • 权重提供初始化;
      • 对不同结构的数据提供transform转换
    3. V2, V3,V4的代码花点时间也可以撸出来。

      • 其中V4可以参考我们实现的ResNet网络。

    相关文章

      网友评论

          本文标题:TORCH03-09GoogLeNet网络

          本文链接:https://www.haomeiwen.com/subject/xskhnhtx.html