1. 整体流程
-
将图像按照特定的比例resize成多个尺度下的图像
-
P-Net(Proposal Net)
[图片上传失败...(image-725dd8-1597038245948)]
-
对于步骤1中的每一个尺度的图像都输入P-Net,输出一个降采样一倍的网格,网格中带有每个位置可能存在的bounding box proposal,包括是否有人脸和位置回归信息。原论文中还会输出关键点的proposal,但是在后续的实现中都将这一部分放在最后一个Net中实现。
-
以原始图片为200x400为例,首先由缩放因子0.5缩放至输入图片为100x200,经过PNet之后输出网格大小为50x100,网格中每一个cell会输出该点位置对应的是否有人脸(onehot*2),以及该cell对应的回归框的偏移
-
这里的偏移是相对于网格坐标映射到原图(200x400)上的偏移,每一个cell会自带一个框的尺度,这个尺度和图片的缩放尺度相关,比如设置成12/缩放因子。
-
从偏移到原图框坐标的代码如下:
<pre spellcheck="false" class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" lang="c++" cid="n118" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-size: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; padding: 8px 4px 6px 0px; margin-bottom: 15px; margin-top: 15px; width: inherit; background-position: inherit inherit; background-repeat: inherit inherit;"> void MTCNN::generateBbox(cv::Mat score, cv::Mat location, std::vector<Bbox>& boundingBox_, float scale)
{
const int stride = 2; // 表示PNet对输入图片的降采样
const int cellsize = 12; // 预设的框尺度大小
int sc_rows, sc_cols;
if ( 4 == score.dims)
{
sc_rows = score.size[2]; // 网格行数
sc_cols = score.size[3]; // 网格列数
}
float* p = (float *)score.data + sc_rows * sc_cols;
float inv_scale = 1.0f / scale;
for(int row = 0; row < sc_rows; row++)
{
for(int col = 0; col < sc_cols; col++)
{
Bbox bbox;
if( *p > threshold[0] )
{
bbox.score = p;
// 下面四行可以看作是anchor box
bbox.x1 = round((stride * col + 1) * inv_scale);
bbox.y1 = round((stride * row + 1) * inv_scale);
bbox.x2 = round((stride * col + 1 + cellsize) * inv_scale);
bbox.y2 = round((stride * row + 1 + cellsize) * inv_scale);
const int index = row * sc_cols + col;
for(int channel = 0;channel < 4; channel++)
{
float tmp = (float *)(location.data) + channel * sc_rows * sc_cols;
bbox.regreCoord[channel] = tmp[index]; // anchor + offset
}
boundingBox_.push_back(bbox);
}
p++;
}
}
return;
}</pre> -
通过边界框回归对所有的BBox的坐标进行refine。集合得到第一阶段的Proposals。refine代码如下:
<pre spellcheck="false" class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" lang="cpp" cid="n123" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-size: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; padding: 8px 4px 6px 0px; margin-bottom: 15px; margin-top: 15px; width: inherit; background-position: inherit inherit; background-repeat: inherit inherit;"> void MTCNN::refine(std::vector<Bbox>& vecBbox, const int& height, const int& width, bool square)
{
if (vecBbox.empty())return;float bbw = 0, bbh = 0, max_side = 0;
float h = 0, w = 0;
float x1 = 0, x2 = 0, y1 = 0, y2 = 0;
for (auto it = vecBbox.begin(); it != vecBbox.end(); it++)
{
bbw = it->x2 - it->x1 + 1;
bbh = it->y2 - it->y1 + 1;
x1 = it->x1 + bbw * it->regreCoord[1];
y1 = it->y1 + bbh * it->regreCoord[0];
x2 = it->x2 + bbw * it->regreCoord[3];
y2 = it->y2 + bbh * it->regreCoord[2];
if(square)
{
w = x2 - x1 + 1;
h = y2 - y1 + 1;
int maxSide = ( h > w ) ? h:w;
x1 = x1 + w * 0.5 - maxSide * 0.5;
y1 = y1 + h * 0.5 - maxSide * 0.5;
x2 = round(x1 + maxSide - 1);
y2 = round(y1 + maxSide - 1);
x1 = round(x1);
y1 = round(y1);
}
it->x1 = x1 < 0 ? 0 : x1;
it->y1 = y1 < 0 ? 0 : y1;
it->x2 = x2 >= width ? width - 1 : x2;
it->y2 = y2 >= height ? height - 1 : y2;
}
}</pre> -
整体流程:
<pre mdtype="fences" cid="n156" lang="cpp" class="md-fences md-end-block md-fences-with-lineno ty-contain-cm modeLoaded" spellcheck="false" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); font-size: 0.9em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-size: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(248, 248, 248); position: relative !important; border: 1px solid rgb(231, 234, 237); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; padding: 8px 4px 6px 0px; margin-bottom: 15px; margin-top: 15px; width: inherit; background-position: inherit inherit; background-repeat: inherit inherit;"> void MTCNN::detectInternal(cv::Mat& img_, std::vector<Bbox>& finalBbox_)
{
const float nms_threshold[3] = {0.7f, 0.7f, 0.7f};
img = img_;
PNet();
if ( !firstBbox_.empty())
{
nms(firstBbox_, nms_threshold[0]);
refine(firstBbox_, img_.rows, img_.cols, true);
RNet();
if( !secondBbox_.empty())
{
nms(secondBbox_, nms_threshold[1]);
refine(secondBbox_, img_.rows, img_.cols, true);
ONet();
if ( !thirdBbox_.empty())
{
refine(thirdBbox_, img_.rows, img_.cols, false);
std::string ts = "Min";
nms(thirdBbox_, nms_threshold[2], ts);
}
}
}
finalBbox_ = thirdBbox_;
thirdBbox_.clear();
}</pre> -
O-Net(output net)
[图片上传失败...(image-1246b1-1597038245942)]
- 对于3中得到的粗BBox,再输入网络获得一个refine过的人脸分类、边界框回归和关键点坐标。通过该网络结果对bbox进行refine,再通过最终的NMS得到最终结果。
-
R-Net(Refine Net)
[图片上传失败...(image-4aef89-1597038245942)]
-
对于2中得到的每一个proposal,从原图中按照bbox将图像抠出来并resize成固定大小输入R-Net,输出人脸分类、边界框回归和5个关键点坐标。
-
这里有一个疑惑,输入图片是一个patch,而边界框的坐标信息是一个全图的全局信息,这是怎么回归出来的呢?
-
NMS之后,通过该网络结果对bbox进行refine。
-
-
将每个尺度下得到bounding boxes分别进行NMS之后,再对所有尺度下的结果进行NMS
-
网友评论