Introduction

作者提到四种形式的特征金字塔结构
(a)对图像做下采样，每一层图像单独预测
(b)网络最后一层预测
(c)网络每一层单独预测
(d)FPN的方式，上采样融合原特征层信息，并且独立预测

Related Work

Hand-engineered features and early neural networks
人工特征
Deep ConvNet object detectors
深层卷积网络特征
Methods using multiple layers
多层卷积网络特征

Feature Pyramid Networks

FPN的具体结构：横向连接采用1 x 1 的卷积，自顶向下通过2x 上采样

Bottom-up pathway
for ResNets [16] we use the feature activations output by each stage’s last residual block. We denote the output of these last residual blocks as{C2,C3,C4,C5} for conv2, conv3, conv4, and conv5 outputs, and note that they have strides of{4, 8, 16, 32}pixels with respect to the input image. We do not include conv1 into the pyramid due to its large memory footprint
作者提到没有用到C1等，因为其庞大的内存占用

Top-down pathway and lateral connections
作者提到对混合后的结果采用3x3卷积 which is to reduce the aliasing effect of upsampling

Applications

RPN

作者提到对于FPN结构来说，不需要设置多尺度的anchor，只需要不同比例的anchor

Formally, we deﬁne the anchors to have areas of {32²,64²,128²,256²,512²} pixels on {P2,P3,P4,P5,P6} respectively

而正负样本的生成和Faster R-CNN中一样，以IoU>0.7为正，IoU<0.3为负

Fast R-CNN

在 Fast RCNN 里，FPN 主要应用于选择提取哪一层的 feature map 来做 ROI pooling。假设特征金字塔结果对应到图像金字塔结果

Formally, we assign an RoI of width w and height h (on the input image to the network)to the level P(k) of our feature pyramid by

Here 224 is the canonical ImageNet pre-training size, and k0 is the target level on which an RoI with w ×h = 224² should be mapped into.

Experiments on Object Detection

RPN
AR = Average Recall
s = small
m = medium
l = large

Comparisons with baselines
a, b, c 对比可得FPN的效果不错

How important is top-down enrichment?
d 没有top-down的效果变差
We conjecture that this is because there are large semantic gaps between different levels on the bottom-up pyramid

How important are lateral connections?
e 没有lateral的效果变差
因为经过上下采样丢失了bottom-up中的细节信息

How important are pyramid representation?
f 没有金字塔结构的效果
效果还行，但是anchor太多效率低下

Fast R-CNN

这里用FPN作为RPN来产生区域建议--固定的建议集合
但是不共享两个网络的特征
可见FPN用于 Fast R-CNN的检测部分效果还是不错的

（a）（b）（c）的对比证明在基于区域的目标卷积问题中，特征金字塔比单尺度特征更有效。（c）（f）的差距很小，作者认为原因是ROI pooling对于region的尺度并不敏感。因此并不能一概认为（f）这种特征融合的方式不好，博主个人认为要针对具体问题来看待，像上面在RPN网络中，可能（f）这种方式不大好，但是在Fast R-CNN中就没那么明显。

Faster R-CNN
最后将 FPN 用于 Faster R-CNN 共享网络参数
对小物体的检测明显效果变好