Introduction
R-CNN 多阶段的训练以及测试速度非常慢
SPPnet 借助空间金字塔池化明显的提升了效率
我们提出的方案改进了R-CNN以及SPPnet
1.Higher detection quality (mAP) than R-CNN, SPPnet
2.Training is single-stage, using a multi-task loss
3.Training can update all network layers
4.No disk storage is required for feature caching
Fast R-CNN architecture and training
The RoI pooling layer uses max pooling to convert the features inside any valid region of interest into a small feature map with a fixed spatial extent of H ×W (e.g., 7×7), where H and W are layer hyper-parameters that are independent of any particular RoI
RoI池化把特征转换为固定的空间大小
In Fast RCNN training, stochastic gradient descent (SGD) minibatches are sampled hierarchically, first by sampling N images and then by sampling R/N RoIs from each image.
Fast R-CNN 使用层次采样的方法,先选N张图片,然后每张图片选R/N个样本作为batch
Mini-batch sampling
- Each SGD mini-batch is constructed from N = 2 images, chosen uniformly at random
- We use mini-batches of size R = 128, sampling 64 RoIs from each image
- we take 25% of the RoIs from object proposals that have intersection over union (IoU) overlap with a groundtruth bounding box of at least 0.5.
- The remaining RoIs are sampled from object proposals that have a maximum IoU with ground truth in the interval [0.1,0.5), following [11].
- The lower threshold of 0.1 appears to act as a heuristic for hard example mining [8]
- During training, images are horizontally flipped with probability 0.5
Multi-task loss
Each training RoI is labeled with a ground-truth class u and a ground-truth bounding-box regression target v. We use a multi-task loss L on each labeled RoI to jointly train for classification and bounding-box regression.
每个训练样本都标注了真实的边界,我们使用多任务损失来训练分类和回归网络。
u 表示样本中是否存在目标
t 是预测坐标
v 是标注坐标
λ 是平衡参数,论文中用的 λ =1
Back-propagation through RoI pooling layers
In words, for each mini-batch RoI r and for each pooling output unit yrj, the partial derivative ∂L/∂ yrj is accumulated if i is the argmax selected for yrj by max pooling. In back-propagation,the partial derivatives ∂L/∂ yrj arealready computed by the backwards function of the layer on top of the RoI pooling layer.
Fast R-CNN detection
Large fully connected layers are easily accelerated by compressing them with truncated SVD
In this technique, a layer parameterized by the u × v weight matrix W is approximately factorized as
Main results
- State-of-the-art mAP on VOC07, 2010, and 2012
- Fast training and testing compared to R-CNN, SPPnet
- Fine-tuning conv layers in VGG16 improves mAP
精度
时间
Design evaluation
Does multi-task training help?
Scale invariance: to brute force or finesse?
image.png
Do SVMs outperform softmax?
Do we need more training data?
Yes
Are more proposals always better?
网友评论