Introduction

R-CNN 多阶段的训练以及测试速度非常慢
SPPnet 借助空间金字塔池化明显的提升了效率

我们提出的方案改进了R-CNN以及SPPnet
1.Higher detection quality (mAP) than R-CNN, SPPnet
2.Training is single-stage, using a multi-task loss
3.Training can update all network layers
4.No disk storage is required for feature caching

Fast R-CNN architecture and training

The RoI pooling layer uses max pooling to convert the features inside any valid region of interest into a small feature map with a ﬁxed spatial extent of H ×W (e.g., 7×7), where H and W are layer hyper-parameters that are independent of any particular RoI

RoI池化把特征转换为固定的空间大小

In Fast RCNN training, stochastic gradient descent (SGD) minibatches are sampled hierarchically, ﬁrst by sampling N images and then by sampling R/N RoIs from each image.

Fast R-CNN 使用层次采样的方法，先选N张图片，然后每张图片选R/N个样本作为batch

Mini-batch sampling

Each SGD mini-batch is constructed from N = 2 images, chosen uniformly at random
We use mini-batches of size R = 128, sampling 64 RoIs from each image
we take 25% of the RoIs from object proposals that have intersection over union (IoU) overlap with a groundtruth bounding box of at least 0.5.
The remaining RoIs are sampled from object proposals that have a maximum IoU with ground truth in the interval [0.1,0.5), following [11].
The lower threshold of 0.1 appears to act as a heuristic for hard example mining [8]
During training, images are horizontally ﬂipped with probability 0.5

Multi-task loss
Each training RoI is labeled with a ground-truth class u and a ground-truth bounding-box regression target v. We use a multi-task loss L on each labeled RoI to jointly train for classiﬁcation and bounding-box regression.

每个训练样本都标注了真实的边界，我们使用多任务损失来训练分类和回归网络。

u 表示样本中是否存在目标
t 是预测坐标
v 是标注坐标
λ 是平衡参数，论文中用的 λ =1

Back-propagation through RoI pooling layers

In words, for each mini-batch RoI r and for each pooling output unit y_rj, the partial derivative ∂L/∂ y_rj is accumulated if i is the argmax selected for y_rj by max pooling. In back-propagation,the partial derivatives ∂L/∂ y_rj arealready computed by the backwards function of the layer on top of the RoI pooling layer.