R-FCN

作者: 小松qxs | 来源:发表于2019-01-01 18:08 被阅读0次

[译] 基于R-FCN的物体检测
深度学习知识点汇总-目标检测（1）
R-FCN(转)
Mxnet R-FCN 训练自己的数据集
《R-FCN: Object Detection via Reg
R-FCN模型总结
R-FCN Python版本实现
R-FCN
菜鸟实习日记~day17(R-FCN)
Face R-FCN

titile	R-FCN: Object Detection via Region-based Fully Convolutional Networks
url	https://arxiv.org/pdf/1605.06409v2.pdf
动机	Faster-RCNN仿照AlexNet和VGG，通过ROI pooling将网络分成两个子网络，（a）共享计算的全卷积网络（b）不共享计算的ROI-wise子网络。受GoogLeNet和ResNet是全卷积网络的启发，目标检测使用全连接网络，不使用ROI-wise子网络，但精度不高。 ROI-wise子网络提高精度，降低速度（不共享计算）。全卷积网络不work的原因在于分类要求平移不变性和检测要求平移变换性之间的矛盾，ROI pooling破坏深层卷积网络的平移不变性，牺牲训练和测试的速度（引入region-wise layers）。希望：耗时的卷积移到前面共享计算。
内容	解决全卷积网络平移不变性与变换性之间的矛盾。 R-FCN结构： (1)共享全卷积网络。 (2)position sensitive score map，引入平移变换性。 (3)a position-sensitive RoI pooling layer，统计maps信息。效果：backbone：101层的Residual Net，VOC2007：83.6%的mAP，2012：82.0。时间：170ms每张图片，Faster RCNN的2.5倍以上。 Our approach：基于region proposal的方法精度较高，R-FCN采用RPN，共享卷积层。最后一个卷积层：produces a bank of k² position-sensitive score maps for each category，channel= k²(C+1)。结束层：position-sensitive RoI pooling layer，产生每个ROI的得分。 Backbone architecture：: ResNet 101，去掉最后一层全连接层，保留前100层，接111024的全卷积层（100层输出是2048，引入11的卷积层降维）。再用k²(C+1)个111024的卷积核生成channel是k²(C+1)的卷积层作为position sensitive score maps。 Position-sensitive score maps & Position-sensitive RoI pooling：每个ROI的编码位置信息：将ROI分成kk，如k=3，9个位置分别是：上左（左上角），上中，上右，中左，中中，中右，下左，下中，下右（右下角）。Position-sensitive ROI pooling公式如下（使用average pooling）：每个ROI类别判断：计算kk个cell的平均分数，计算softmax，损失：交叉熵。边框回归：* 与k²(C+1)卷积层相似，有一个4k²的卷积层，ROI pooling得到channel=4的kk的卷积层，最后投票得到4维向量t=(t_x, t_y, t_w, t_h)。损失：L1-smooth损失。 position-sensitive score maps灵感来源与FCNs的实例分割 ROI层后没有需要学习的层，nearly cost-free region-wise computation，加速训练和测试。 Training：* 训练中采用：online hard example mining (OHEM) 主要思想：对负样本进行筛选，使得正负样本更加平衡。实现：所有N个proposal前向传播，筛选B个loss最高的反向传播，由于R-FCN ROI之后cost-free，所以影响不大，但对于Faster RCNN采取这种方法，前向传播时间会花费双倍。实验参数：weight decay：0.0005，momentum：0.9，single-scale training：shorter side of image is 600 pixels。Each GPU holds 1 image and selects B = 128 RoIs for backprop。learning rate：0.001，20k mini-batches。0.0001，10k mini-batches on VOC.。the 4-step alternating training。 Inference：evaluate 300 RoIs，NMS阈值0.3。 À trous and stride： (1) R-FCN：ResNet-101的有效stride从32 pixels降到16 pixels，增加score map分辨率。 (2) conv4 stage(stride=16)之前不变， conv5 block stride=2改成stride=1，并且卷积核使用“hole algorithm”补偿stride的减小。 (3) RPN在conv4 stage后计算，与Faster相同。 (4) R-FCN (k × k = 7 × 7, no hard example mining)，The à trous trick improves mAP by 2.6 points。 Visualization：候选框与真实值精确重合，k²个bin分数都很高，如果有偏移，有的bin分数会比较低。
实验	Experiments on PASCAL VOC：（1）Comparisons with Other Fully Convolutional Strategies：（a）Naïve Faster R-CNN：ResNet101+ROI pooling+21 classes fc层。（b）Class-specific RPN：与RPN相同只是21-class替代2-class。（c）R-FCN without position-sensitivity：R-FCN中k=1，等价于ROI的global pooling。 Table2：表明Faster中间插入ROI pooling获取空间信息的重要性。表明RPN相当于滑动窗的Fast RCNN，效果差。表明R-FCN成功编码空间位置信息并定位。表明position-sensitivity重要性，k=1无法收敛。（2）Comparisons with Faster R-CNN Using ResNet-101：R-FCN比Faster RCNN好，速度快。 Experiments on MS COCO：效果相近，速度是Faster的2.5倍。对小目标比Faster好。
思考	Training不是streamlined process，还达不到实时