阅读论文时常有种感受,可能作者在原有网络上稍微优化了网络结构或损失函数,然后发现检测精度有所上升便撰写了一篇论文。其中心思想可能就几句话,作者为了论文的完整性,便需要写概述、相关工作、对比试验等一系列内容。本文希望用精简的语言将MobileNet各阶段的创新点写出。
MobileNet V1
创新点:使用了深度可分离卷积(depthwise separable convolutions)来减少参数量,提高检测速度。

深度可分离卷积分为深度卷积和逐点卷积两个部分,深度卷积由于只注重单个通道的信息,没有考虑通道间的信息,所以后面利用逐点卷积来解决该问题。其中逐点卷积也是大小为1x1的卷积核。对于上图红框中的维度,个人觉得有问题,应为5x5x1x3,表示有3个大小为5x5x1的卷积核。
参数量和计算量对比
标准卷积的参数量和计算量

深度可分离卷积的参数量和计算量

如果一般使用大小为3x3的卷积核的话,参数量可以减少为原来的1/8~1/9。

在参数量大幅下降的情况下,最终的检测精度比原来下降的不多。

MobileNet V1网络结构pytorch实现代码
import torch.nn as nn
class MobileNet_V1(nn.Module):
def __init__(self):
super(MobileNet_V1, self).__init__()
# 网络模型声明
self.model = nn.Sequential(
self.conv_bn(3, 32, 2),
self.conv_dw(32, 64, 1),
self.conv_dw(64, 128, 2),
self.conv_dw(128, 128, 1),
self.conv_dw(128, 256, 2),
self.conv_dw(256, 256, 1),
self.conv_dw(256, 512, 2),
self.conv_dw(512, 512, 1),
self.conv_dw(512, 512, 1),
self.conv_dw(512, 512, 1),
self.conv_dw(512, 512, 1),
self.conv_dw(512, 512, 1),
self.conv_dw(512, 1024, 2),
self.conv_dw(1024, 1024, 1),
nn.AvgPool2d(7)
)
self.fc = nn.Linear(1024, 1000)
def forward(self, input):
output = self.model(input)
output = output.view(input.size(0), -1)
output = self.fc(output)
return output
# 标准卷积
def conv_bn(self, in_channel, out_channel, stride):
return nn.Sequential(
nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, bias=False),
nn.BatchNorm2d(out_channel),
nn.ReLU(inplace=True)
)
# 深度可分离卷积
def conv_dw(self, in_channel, out_channel, stride):
return nn.Sequential(
# 深度卷积
nn.Conv2d(in_channel, in_channel, kernel_size=3, stride=stride, padding=1, groups=in_channel,
bias=False),
nn.BatchNorm2d(in_channel),
nn.ReLU6(inplace=True),
# 逐点卷积
nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(out_channel),
nn.ReLU6(inplace=True)
)
MobileNet V2
创新点:1、Linear Bottlenecks。2、Inverted Residuals。这两点其实已经直接简明地写在标题上了:)
一、Linear Bottlenecks:
即将V1中逐点卷积后的非线性激活函数去掉。

原因:深度可分离卷积确实是大大降低了计算量, 而且NxN Depthwise + 1X1 PointWise的结构在性能上也能接近NxN的标准卷积。但在实际使用的时候, 发现Depthwise 部分的kernel比较容易训废掉:训练完之后发现depthwise训出来的kernel有不少是空的(即出现下图情况)。
由于输入channel太少,导致很容易出现小于0的情况,如果再用非线性函数激活会出现死节点,使得神经元输出变为0,所以就学废了:ReLU对于0的输出的梯度为0,所以一旦陷入了0输出,就没法恢复了。并且这个问题在定点化低精度训练的时候会进一步放大。所以将逐点卷积的激活函数去掉,减少ReLU对特征的破坏。

二、Inverted Residuals:有点参考残差网络的意思,由于残差网络ResNet采用的是先降维再升维的操作,而MobileNet V2采用的是先升维再降维的反向操作,所以取名为Inverted Residuals。
至于为什么要先升维再降维呢?由于深度卷积本身没有改变通道的能力,来的是多少通道输出就是多少通道。如果来的通道很少的话,深度卷积只能在低维度上工作,这样效果并不会很好,所以我们要“扩张”通道。既然已经知道逐点卷积也就是1×1卷积可以用来升维和降维,那就可以在深度卷积之前使用逐点卷积进行升维(升维倍数为t,t=6),再在一个更高维的空间中进行深度卷积操作来提取特征。

Invered residual 有两个好处:1. 复用特征。2. 旁支block内先通过1x1升维,再接depthwise conv以及ReLU,通过增加ReLU的Input维度, 来缓解特征的退化情况.
将两个创新点结合:最后将一和二创新点结合与V1进行对比。

MobileNet V2网络结构pytorch实现代码
import torch.nn as nn
class ResidualBlock(nn.Module):
def __init__(self, in_channel, out_channel, stride, expand):
super(ResidualBlock, self).__init__()
self.stride = stride
# 升维逐点卷积
self.conv_pw1 = nn.Sequential(
nn.Conv2d(in_channel, in_channel * expand, kernel_size=1, stride=1, bias=False),
nn.BatchNorm2d(in_channel * expand),
nn.ReLU6(inplace=True)
)
# 深度卷积
self.conv_dw = nn.Sequential(
nn.Conv2d(in_channel * expand, in_channel * expand, kernel_size=3, stride=stride, padding=1,
groups=in_channel * expand, bias=False),
nn.BatchNorm2d(in_channel * expand),
nn.ReLU6(inplace=True)
)
#降维逐点卷积
self.conv_pw2 = nn.Sequential(
nn.Conv2d(in_channel * expand, out_channel, kernel_size=1, stride=1, bias=False),
nn.BatchNorm2d(out_channel)
)
self.down_sample = None
if self.stride == 1 and in_channel != out_channel:
self.down_sample = nn.Sequential(
nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(out_channel)
)
def forward(self, input):
output = self.conv_pw1(input)
output = self.conv_dw(output)
output = self.conv_pw2(output)
if self.down_sample is not None:
output = output + self.down_sample(input)
return output
class MobileNet_v2(nn.Module):
def __init__(self, num_classes=10):
super(MobileNet_v2, self).__init__()
self.module_list = nn.ModuleList()
self.module_list.add_module('stem', self.conv_bn(3, 32, 2))
self.in_channels = 32
self.layers = [1, 2, 3, 4, 3, 3, 1] # 该模块重复次数
self.strides = [1, 2, 2, 2, 1, 2, 1] # 该模块步长
self.expand = [1, 6, 6, 6, 6, 6, 6] # 输入通道的倍增系数
self.out_channel = [16, 24, 32, 64, 96, 160, 320] # 输出通道数
for index in range(len(self.layers)):
self.module_list.add_module('bottleneck{}'.format(index),
self.make_layer(ResidualBlock, self.out_channel[index], self.layers[index],
self.strides[index], self.expand[index]))
self.module_list.add_module('conv1', nn.Sequential(
nn.Conv2d(self.in_channels, 1280, kernel_size=1, stride=1, bias=False),
nn.BatchNorm2d(1280),
nn.ReLU(inplace=True)
))
self.module_list.add_module('avgpool', nn.Sequential(
nn.AvgPool2d(kernel_size=7)
))
self.module_list.add_module('liear', nn.Sequential(
nn.Linear(1280, num_classes)
))
def forward(self, input):
output = input
for index, module in enumerate(self.module_list):
if index == len(self.module_list) - 1:
output = output.view(input.size(0), -1)
output = module(output)
return output
# 标准卷积
def conv_bn(self, in_channel, out_channel, stride):
return nn.Sequential(
nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, padding=1, bias=False),
nn.BatchNorm2d(out_channel),
nn.ReLU(inplace=True)
)
def make_layer(self, block, out_channel, blocks, stride, expand):
layers = []
model = block(self.in_channels, out_channel, stride, expand)
layers.append(model)
for num in range(1, blocks):
model = block(out_channel, out_channel, stride=1, expand=expand)
layers.append(model)
self.in_channels = out_channel
return nn.Sequential(*layers)
MobileNet V3
创新点:1、引入基于squeeze and excitation结构的轻量级注意力模型,为深度卷积后的每个通道特征分配相应的权重。(具体可以参考https://www.jianshu.com/p/40ee2e9c9530
)2、引入新的激活函数h-swish。3、网络结构尾部的调整,加快运算速度。
一、引入squeeze and excitation注意力机制:

二、h-swish激活函数:
swish论文的作者认为,swish具备无上界有下界、平滑、非单调的特性,发现使用swish激活函数在深层模型上的效果优于ReLU,但是sigmoid的计算对于移动设备并不友好,于是作者想到了用值相近的函数来替代swish,于是便出现了h-swish。下图可以看出h-swish与swish的值相差很小,而且h-swish中没有sigmoid操作,对于移动端的设备计算比较友好。


同时,作者认为随着网络的深入,应用非线性激活函数的成本会降低,能够更好的减少参数量。作者发现swish的大多数好处都是通过在更深的层中使用它们实现的。因此,在V3的架构中,只在模型的后半部分使用h-swish(HS)。

三、网络结构尾部的调整:
原先使用1×1卷积来构建最后层,可以便于拓展到更高维的特征空间。在预测时,有更多更丰富的特征来满足预测,但是同时也引入了额外的计算成本与延时。所以现在为了保留高维特征并减少计算延迟,去掉了最后的一些层来提速,即先使用global average pooling降低计算代价。

MobileNet V3网络结构pytorch实现代码
import torch.nn as nn
class HardSwish(nn.Module):
def __init__(self, inplace=True):
super(HardSwish, self).__init__()
self.relu6 = nn.ReLU6(inplace)
def forward(self, x):
return x * self.relu6(x + 3) / 6
# 深度卷积
def DwBNActivation(in_channels, out_channels, kernel_size, stride, activate):
return nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,
padding=(kernel_size - 1) // 2, groups=in_channels),
nn.BatchNorm2d(out_channels),
nn.ReLU6(inplace=True) if activate == 'relu' else HardSwish()
)
# 逐点卷积
def PwBNActivation(in_channels, out_channels, activate):
return nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1),
nn.BatchNorm2d(out_channels),
nn.ReLU6(inplace=True) if activate == 'relu' else HardSwish()
)
def Conv1x1BN(in_channels, out_channels):
return nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1),
nn.BatchNorm2d(out_channels)
)
# SEblock
class SqueezeAndExcite(nn.Module):
def __init__(self, in_channels, out_channels, se_kernel_size, divide=4):
super(SqueezeAndExcite, self).__init__()
mid_channels = in_channels // divide
self.pool = nn.AvgPool2d(kernel_size=se_kernel_size, stride=1)
self.SEblock = nn.Sequential(
nn.Linear(in_features=in_channels, out_features=mid_channels),
nn.ReLU6(inplace=True),
nn.Linear(in_features=mid_channels, out_features=out_channels),
HardSwish(inplace=True),
)
def forward(self, x):
b, c, h, w = x.size()
out = self.pool(x)
out = out.view(b, -1)
out = self.SEblock(out)
out = out.view(b, c, 1, 1)
return out * x
# 1、逐点卷积升维。2、深度卷积。3、SEblock。4、逐点卷积降维。5、shortcut(若stride为1)
class SEInvertedBottleneck(nn.Module):
def __init__(self, in_channels, mid_channels, out_channels, kernel_size, stride, activate, use_se,
se_kernel_size=1):
super(SEInvertedBottleneck, self).__init__()
self.stride = stride
self.use_se = use_se
# mid_channels = (in_channels * expansion_factor)
self.conv = PwBNActivation(in_channels, mid_channels, activate)
self.depth_conv = DwBNActivation(mid_channels, mid_channels, kernel_size, stride, activate)
if self.use_se:
self.SEblock = SqueezeAndExcite(mid_channels, mid_channels, se_kernel_size)
self.point_conv = PwBNActivation(mid_channels, out_channels, activate)
if self.stride == 1:
self.shortcut = Conv1x1BN(in_channels, out_channels)
def forward(self, x):
out = self.depth_conv(self.conv(x))
if self.use_se:
out = self.SEblock(out)
out = self.point_conv(out)
out = (out + self.shortcut(x)) if self.stride == 1 else out
return out
class MobileNetV3(nn.Module):
def __init__(self, num_classes=1000, type='large'):
super(MobileNetV3, self).__init__()
self.type = type
self.first_conv = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(16),
HardSwish(inplace=True),
)
if type == 'large':
self.large_bottleneck = nn.Sequential(
SEInvertedBottleneck(in_channels=16, mid_channels=16, out_channels=16, kernel_size=3, stride=1,
activate='relu', use_se=False),
SEInvertedBottleneck(in_channels=16, mid_channels=64, out_channels=24, kernel_size=3, stride=2,
activate='relu', use_se=False),
SEInvertedBottleneck(in_channels=24, mid_channels=72, out_channels=24, kernel_size=3, stride=1,
activate='relu', use_se=False),
SEInvertedBottleneck(in_channels=24, mid_channels=72, out_channels=40, kernel_size=5, stride=2,
activate='relu', use_se=True, se_kernel_size=28),
SEInvertedBottleneck(in_channels=40, mid_channels=120, out_channels=40, kernel_size=5, stride=1,
activate='relu', use_se=True, se_kernel_size=28),
SEInvertedBottleneck(in_channels=40, mid_channels=120, out_channels=40, kernel_size=5, stride=1,
activate='relu', use_se=True, se_kernel_size=28),
SEInvertedBottleneck(in_channels=40, mid_channels=240, out_channels=80, kernel_size=3, stride=1,
activate='hswish', use_se=False),
SEInvertedBottleneck(in_channels=80, mid_channels=200, out_channels=80, kernel_size=3, stride=1,
activate='hswish', use_se=False),
SEInvertedBottleneck(in_channels=80, mid_channels=184, out_channels=80, kernel_size=3, stride=2,
activate='hswish', use_se=False),
SEInvertedBottleneck(in_channels=80, mid_channels=184, out_channels=80, kernel_size=3, stride=1,
activate='hswish', use_se=False),
SEInvertedBottleneck(in_channels=80, mid_channels=480, out_channels=112, kernel_size=3, stride=1,
activate='hswish', use_se=True, se_kernel_size=14),
SEInvertedBottleneck(in_channels=112, mid_channels=672, out_channels=112, kernel_size=3, stride=1,
activate='hswish', use_se=True, se_kernel_size=14),
SEInvertedBottleneck(in_channels=112, mid_channels=672, out_channels=160, kernel_size=5, stride=2,
activate='hswish', use_se=True, se_kernel_size=7),
SEInvertedBottleneck(in_channels=160, mid_channels=960, out_channels=160, kernel_size=5, stride=1,
activate='hswish', use_se=True, se_kernel_size=7),
SEInvertedBottleneck(in_channels=160, mid_channels=960, out_channels=160, kernel_size=5, stride=1,
activate='hswish', use_se=True, se_kernel_size=7),
)
self.large_last_stage = nn.Sequential(
nn.Conv2d(in_channels=160, out_channels=960, kernel_size=1, stride=1),
nn.BatchNorm2d(960),
HardSwish(inplace=True),
nn.AvgPool2d(kernel_size=7, stride=1),
nn.Conv2d(in_channels=960, out_channels=1280, kernel_size=1, stride=1),
HardSwish(inplace=True),
)
else:
self.small_bottleneck = nn.Sequential(
SEInvertedBottleneck(in_channels=16, mid_channels=16, out_channels=16, kernel_size=3, stride=2,
activate='relu', use_se=True, se_kernel_size=56),
SEInvertedBottleneck(in_channels=16, mid_channels=72, out_channels=24, kernel_size=3, stride=2,
activate='relu', use_se=False),
SEInvertedBottleneck(in_channels=24, mid_channels=88, out_channels=24, kernel_size=3, stride=1,
activate='relu', use_se=False),
SEInvertedBottleneck(in_channels=24, mid_channels=96, out_channels=40, kernel_size=5, stride=2,
activate='hswish', use_se=True, se_kernel_size=14),
SEInvertedBottleneck(in_channels=40, mid_channels=240, out_channels=40, kernel_size=5, stride=1,
activate='hswish', use_se=True, se_kernel_size=14),
SEInvertedBottleneck(in_channels=40, mid_channels=240, out_channels=40, kernel_size=5, stride=1,
activate='hswish', use_se=True, se_kernel_size=14),
SEInvertedBottleneck(in_channels=40, mid_channels=120, out_channels=48, kernel_size=5, stride=1,
activate='hswish', use_se=True, se_kernel_size=14),
SEInvertedBottleneck(in_channels=48, mid_channels=144, out_channels=48, kernel_size=5, stride=1,
activate='hswish', use_se=True, se_kernel_size=14),
SEInvertedBottleneck(in_channels=48, mid_channels=288, out_channels=96, kernel_size=5, stride=2,
activate='hswish', use_se=True, se_kernel_size=7),
SEInvertedBottleneck(in_channels=96, mid_channels=576, out_channels=96, kernel_size=5, stride=1,
activate='hswish', use_se=True, se_kernel_size=7),
SEInvertedBottleneck(in_channels=96, mid_channels=576, out_channels=96, kernel_size=5, stride=1,
activate='hswish', use_se=True, se_kernel_size=7),
)
self.small_last_stage = nn.Sequential(
nn.Conv2d(in_channels=96, out_channels=576, kernel_size=1, stride=1),
nn.BatchNorm2d(576),
HardSwish(inplace=True),
nn.AvgPool2d(kernel_size=7, stride=1),
nn.Conv2d(in_channels=576, out_channels=1280, kernel_size=1, stride=1),
HardSwish(inplace=True),
)
self.classifier = nn.Conv2d(in_channels=1280, out_channels=num_classes, kernel_size=1, stride=1)
def init_params(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.Linear):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def forward(self, x):
x = self.first_conv(x)
if self.type == 'large':
x = self.large_bottleneck(x)
x = self.large_last_stage(x)
else:
x = self.small_bottleneck(x)
x = self.small_last_stage(x)
out = self.classifier(x)
out = out.view(out.size(0), -1)
return out
参考博客
https://zhuanlan.zhihu.com/p/70703846
https://blog.csdn.net/shanglianlm/article/details/90050428#comments
https://www.zhihu.com/question/265709710/answer/298245276
网友评论