ocr的检测大部分主干网络用的都是resnet50,复习一下。
![](https://img.haomeiwen.com/i13198972/a9b419d2c21aa6f4.png)
检测网络取conv_2(1/4) conv_3(1/8) conv_4(1/16) conv_5(1/32)四个特征图做FPN
网络细节
每个block只有第一次卷积才stride=2,然后就不需要maxpooling了,
卷积 1×1→ bn → relu → 卷积3×3 → bn → relu → 卷积 1×1 → bn(加残差x)→relu
卷积 1×1 为了降低维度,再提升维度,减少参数
# pytorch vision 实现
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=None):
super(Bottleneck, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
width = int(planes * (base_width / 64.)) * groups
# Both self.conv2 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv1x1(inplanes, width)
self.bn1 = norm_layer(width)
self.conv2 = conv3x3(width, width, stride, groups, dilation)
self.bn2 = norm_layer(width)
self.conv3 = conv1x1(width, planes * self.expansion)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
网友评论