美文网首页
Faster R-CNN 入坑之源码阅读

Faster R-CNN 入坑之源码阅读

作者: yanghedada | 来源:发表于2018-10-11 11:09 被阅读497次

    Faster R-CNN 原理简述

    上面就是Faster R-CNN的原理图:

      1. 首先搭建一个faster rcnn的基础模型,搭建一个全卷积网络。
      1. 全卷积网络会对原始的image进行maxpooling,vgg16进行2x2x2x2的maxpooling,最后把图片进行1/16倍的缩放。
    • 3.全卷积网络最后一层分为两个通道,(这里使用net称呼最后一层的feature map,程序里就是使用的net)一个net送入RPN进行区域推荐,得到的是box的坐标和坐标的分数(二分类,包不包含有物体)。在得到box之后,box就会在另一个net上进行特征的裁剪,再把所有裁剪下来的feature map进行尺寸归一到固定尺寸这就是所谓的ROIs。

    • 4.得到所有的rois区域之后,把这些rois区域进行拉平输出到两个地方,一个是进行物体的分类,一个是进行box坐标的回归(相当与在之前的box上进行的微调)。

      1. 得到精确的box坐标和物体类别分数。
    • 6.记住,记住这里有坑,就是关于训练和非训练是的参数设置是不一样的。在这里提一个醒,就是在进行正样本和负样本训练的时候,这里会和ground truth(box的真实位置)进行对比(就在这里),非训练是没有ground truth,所以这里不进行ground truth的操作,而是直接使用top和nms进行。

    • 7.注意这里有两个回归和两个分类,第一个回归和分类在RPN网络,进行box的粗糙选取和是否含有物体的二分类,第二的回归和分类在ROIs之后的predication网络,这里的分类是box的精确回归和物体的分类(20分类)。

    • (备注: faser rcnn 的原理很简单,但是这里面最最最复杂的是数据的处理,这些数据处理没有训练参数,但是,却占据了90%的代码量)

    代码整理

    这是我的工程项目文档,在github上下载的,网上的版本太多,我不想一一去看了,本来是入坑Google 的object detection api 的,但还是需要看一看这种稍微简单一点的源码才能理顺思路。


    代码前言

    faster rcnn的整个网络是由一个叫做network.py的文件中的基类Network进行操作,所有的流程被这个叫Network的子类实现,所以,可以通过构建多个Network子类构建多个物体检测的子类了,源码里面有实现了两个子类vgg16和resnet子类。

    # vgg16.py
    class vgg16(Network):
        def __init__(self, batch_size=1):
            Network.__init__(self, batch_size=batch_size)
    
    # resnetv1.py
    class resnetv1(Network):
      def __init__(self, batch_size=1, num_layers=50):
        Network.__init__(self, batch_size=batch_size)
        self._num_layers = num_layers
        self._resnet_scope = 'resnet_v1_%d' % num_layers
    

    开始demo.py和train.py

    • 1.demo.py

    构建一个基于vgg16的faster rcnn模型,把训练好的模型参数从cptk文件中回复,输入img进行检测。

    • train.py
      构建一个基于vgg16的faster rcnn模型,,把预训练的模型参数从cptk文件中回复,输入img进行检测。
      其实在demo.py和train.py中都有这么一句代码:
    # demo.py
    net.create_architecture(sess, "TEST", 21,
                                tag='default', anchor_scales=[8, 16, 32])
    
    # train.py
    layers = self.net.create_architecture(sess, "TRAIN", 
                                self.imdb.num_classes, tag='default')
    

    这就是构建faster rcnn进行计算图的操作。
    请记住这个模型在进行模型参数恢复时,train.py和demo.py的不一样,demo.py时把所的参数进行恢复并赋值。而train.py只恢复到fc7,fc7输出时4096,在fc7后面接了两个输出,就是box坐标和classes。如果我们自己的训练数据并不是20或者99,我们在train.py的时候只需要更改num_classes既可以了,fc7后面的层就是合适新的分类任务所需的。

    开始分析vgg16():

    铺垫这么多,只为了在后面进行分析时候能有个索引。
    vgg16这个类对外面调用的类似乎只有少数几个方法。vgg16框架

    import tensorflow as tf
    import tensorflow.contrib.slim as slim
    
    import lib.config.config as cfg
    from lib.nets.network import Network
    
    class vgg16(Network):
        def __init__(self, batch_size=1):
            Network.__init__(self, batch_size=batch_size)
    
        def build_network(self, sess, is_training=True):
                 。。。。
                
                # rois 所有的rois框的坐标的分类得分
                # cls_prob 进行_num_classes的分类得分,经过softmax
                # bbox_prediction 进行 box的回归
                return rois, cls_prob, bbox_pred
    
        def get_variables_to_restore(self, variables, var_keep_dic):
            
    
            return variables_to_restore
    
        def fix_variables(self, sess, pretrained_model):
            ....
    
        def build_head(self, is_training):
            # 全卷積網絡爲五個層,每層有一個卷積,一個池化操作,但是,最後一層操作中,僅
            # 有一個卷積操作,無池化操作。
            .....
            #输出的图片被 缩短/16
            return net
    
        def build_rpn(self, net, is_training, initializer):
    
            # Build anchor component
            # 用來生成九個框的函數
            。。。。
            
            # 二分類操作和迴歸操作是並行的,於是用同樣1×1的卷積去操作原來的future map,
            # 生成長度爲4×k,即_num_anchors×4的長度
            rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
            # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分数是否有物体,进行二分类经过了softmax
            # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐标,进行坐标回归
            # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分数是否有物体,进行没有二分类经过了softmax
            # rpn_cls_score_reshape ,是shape=[None, 2]的框分数
            return rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape
    
        def build_proposals(self, is_training, rpn_cls_prob, rpn_bbox_pred, rpn_cls_score):
            # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分数是否有物体,进行二分类经过了softmax
            # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐标,进行坐标回归
            # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分数是否有物体,进行没有二分类经过了softmax
            # 获得合适的roi
            # 对坐标的操作,rios为筛选出来的合适的框,roi_scores为
            。。。。。。
            return rois
    
        def build_predictions(self, net, rois, is_training, initializer, initializer_bbox):
    
            # Crop image ROIs
            # 构建固定大小的rois窗口
            .......
            
            # 通过fc7进行box框的分类
            bbox_prediction = slim.fully_connected(fc7, self._num_classes * 4, weights_initializer=initializer_bbox, trainable=is_training, activation_fn=None, scope='bbox_pred')
            # cls_score 进行_num_classes的分类得分
            # cls_prob 进行_num_classes的分类得分,经过softmax
            # bbox_prediction 进行 box的回归
            return cls_score, cls_prob, bbox_prediction
    
    

    从上面看vgg16似乎只有7个可用的方法,但是记住vgg16时继承了Network的所有的方法,也就是说Network的所有方法vgg16都有。那我们开始抽丝剥茧吧先从create_architecture()开始:

    create_architecture()

    def create_architecture(self, sess, mode, num_classes, tag=None, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
            self._image = tf.placeholder(tf.float32, shape=[self._batch_size, None, None, 3])
            self._im_info = tf.placeholder(tf.float32, shape=[self._batch_size, 3])
            self._gt_boxes = tf.placeholder(tf.float32, shape=[None, 5])
            self._tag = tag
    
            self._num_classes = num_classes
            self._mode = mode
            self._anchor_scales = anchor_scales
            self._num_scales = len(anchor_scales)
    
            self._anchor_ratios = anchor_ratios
            self._num_ratios = len(anchor_ratios)
            
            # K 个box框
            self._num_anchors = self._num_scales * self._num_ratios
    
            training = mode == 'TRAIN'
            testing = mode == 'TEST'
    
            assert tag != None
    
            # handle most of the regularizer here
            weights_regularizer = tf.contrib.layers.l2_regularizer(cfg.FLAGS.weight_decay)
            if cfg.FLAGS.bias_decay:
                biases_regularizer = weights_regularizer
            else:
                biases_regularizer = tf.no_regularizer
    
            # list as many types of layers as possible, even if they are not used now
            with arg_scope([slim.conv2d, slim.conv2d_in_plane,
                            slim.conv2d_transpose, slim.separable_conv2d, slim.fully_connected],
                           weights_regularizer=weights_regularizer,
                           biases_regularizer=biases_regularizer,
                           biases_initializer=tf.constant_initializer(0.0)):
                # 前面指定了一系列卷積,反捲積的參數,核心代碼爲295行
                # rois爲roi pooling層得到的框,
                # cls_prob得到的是最後全連接層的分類score,
                # bbox_pred得到的是二十一分類之後的分類標籤。
                rois, cls_prob, bbox_pred = self.build_network(sess, training)
    
            layers_to_output = {'rois': rois}
            layers_to_output.update(self._predictions)
    
            for var in tf.trainable_variables():
                self._train_summaries.append(var)
    
            if mode == 'TEST':
                stds = np.tile(np.array(cfg.FLAGS2["bbox_normalize_stds"]), (self._num_classes))
                means = np.tile(np.array(cfg.FLAGS2["bbox_normalize_means"]), (self._num_classes))
                self._predictions["bbox_pred"] *= stds
                self._predictions["bbox_pred"] += means
            else:
                self._add_losses()
                layers_to_output.update(self._losses)
    
            val_summaries = []
            with tf.device("/cpu:0"):
                val_summaries.append(self._add_image_summary(self._image, self._gt_boxes))
                for key, var in self._event_summaries.items():
                    val_summaries.append(tf.summary.scalar(key, var))
                for key, var in self._score_summaries.items():
                    self._add_score_summary(key, var)
                for var in self._act_summaries:
                    self._add_act_summary(var)
                for var in self._train_summaries:
                    self._add_train_summary(var)
    
            self._summary_op = tf.summary.merge_all()
            if not testing:
                self._summary_op_val = tf.summary.merge(val_summaries)
    
            return layers_to_output
    
    

    在create_architecture中先是定义了输入如下:
    包括img(图片),im_info(img的尺寸),_gt_boxes(坐标标签),_tag(类别标签)

    self._image = tf.placeholder(tf.float32, shape=[self._batch_size, None, None, 3])
    self._im_info = tf.placeholder(tf.float32, shape=[self._batch_size, 3])
    self._gt_boxes = tf.placeholder(tf.float32, shape=[None, 5])
    self._tag = tag
    

    其他就是网络的参数,需要在构建网络是进行设置,如下:

    self._num_classes = num_classes(类别数)
    self._mode = mode(训练还是,非训练)
    self._anchor_scales = anchor_scales(框的尺寸,预测)
    self._num_scales = len(anchor_scales)
    
    self._anchor_ratios = anchor_ratios
    self._num_ratios = len(anchor_ratios)
            
            # K 个box框
    self._num_anchors = self._num_scales * self._num_ratios(多少个框==9)
    
    training = mode == 'TRAIN'
    testing = mode == 'TEST'
    

    接下来开始进行网络的运行build_network()

    # 前面指定了一系列卷積,反捲積的參數,核心代碼爲295行
    # rois爲roi pooling層得到的框,
    # cls_prob得到的是最後全連接層的分類score,
    # bbox_pred得到的是二十一分類之後的分類標籤。
    rois, cls_prob, bbox_pred = self.build_network(sess, training)
    

    build_network产生了img经过网络之后的输出,
    rois为roi pooling层得到的框,
    cls_prob得到的是最后全全连接层的score,
    bbox_pred得到的是二十一分类之后的分类目标。
    什么!!!就做完了,过程呢!!!!

    接下来看看build_network发生了什么啊,

    build_network()

    build_network()在vgg16中实现了

    def build_network(self, sess, is_training=True):
            with tf.variable_scope('vgg_16', 'vgg_16'):
                """
                分爲了幾段,build head,buildrpn,build proposals,build predictions
                對應的剛好是我們所剛剛敘述的全卷積層,RPN層,Proposal Layer,和最後經過的全連接層。
                """
                # select initializer
                if cfg.FLAGS.initializer == "truncated":
                    initializer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)
                    initializer_bbox = tf.truncated_normal_initializer(mean=0.0, stddev=0.001)
                else:
                    initializer = tf.random_normal_initializer(mean=0.0, stddev=0.01)
                    initializer_bbox = tf.random_normal_initializer(mean=0.0, stddev=0.001)
    
                # Build head
                # 全卷積網絡層的建立(build head)
                # 输出的图片被 缩短/16
                net = self.build_head(is_training)
    
                # Build rpn
                # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分数是否有物体,进行二分类经过了softmax
                # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐标,进行坐标回归
                # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分数是否有物体,进行没有二分类经过了softmax
                # rpn_cls_score_reshape ,是shape=[None, 2]的框分数
                rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape = self.build_rpn(net, is_training, initializer)
    
                # Build proposals
                # 还是筛选框rois,选择合适的框
                rois = self.build_proposals(is_training, rpn_cls_prob, rpn_bbox_pred, rpn_cls_score)
    
                # Build predictions
                # cls_score 进行_num_classes的分类得分
                # cls_prob 进行_num_classes的分类得分,经过softmax
                # bbox_prediction 进行 box的回归
                cls_score, cls_prob, bbox_pred = self.build_predictions(net, rois, is_training, initializer, initializer_bbox)
    
                self._predictions["rpn_cls_score"] = rpn_cls_score
                self._predictions["rpn_cls_score_reshape"] = rpn_cls_score_reshape
                self._predictions["rpn_cls_prob"] = rpn_cls_prob
                self._predictions["rpn_bbox_pred"] = rpn_bbox_pred
                self._predictions["cls_score"] = cls_score
                self._predictions["cls_prob"] = cls_prob
                self._predictions["bbox_pred"] = bbox_pred
                self._predictions["rois"] = rois
    
                self._score_summaries.update(self._predictions)
                
                # rois 所有的rois框的坐标的分类得分
                # cls_prob 进行_num_classes的分类得分,经过softmax
                # bbox_prediction 进行 box的回归
                return rois, cls_prob, bbox_pred
    

    所以说,就是从上面的几个函数进行如下

    build_head()--->build_rpn()--->build_proposas()--->build_predictions()

    • 1.build_head()函数: 构建CNN基层网络
    • 2.build_rpn()函数: 在feature map上生成box的坐标和判断是否有物体
    • 3.build_proposas()函数: 对box进行判断,挑选合适的box,其中进行iou和nms操作,这里没有训练参数的生成。
    • 4.build_predictions():这里进行最后的类别分类和box框回归之前会有一个rois网络层,该网络会把所有的feature map进行尺寸resize到固定的尺寸,之后进行拉伸。这里有两路输出,一个是box的坐标,另一个是类别的分数。

    这样就可以进行代码的深入分析了:
    先从build_head()开始:

    build_head()

    def build_head(self, is_training):
            # 全卷積網絡爲五個層,每層有一個卷積,一個池化操作,但是,最後一層操作中,僅
            # 有一個卷積操作,無池化操作。
            # Main network
            # Layer  1
            net = slim.repeat(self._image, 2, slim.conv2d, 64, [3, 3], trainable=False, scope='conv1')
            net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool1')
    
            # Layer 2
            net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], trainable=False, scope='conv2')
            net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool2')
    
            # Layer 3
            net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], trainable=is_training, scope='conv3')
            net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool3')
    
            # Layer 4
            net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv4')
            net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool4')
    
            # Layer 5
            net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv5')
    
            # Append network to summaries
            self._act_summaries.append(net)
    
            # Append network as head layer
            self._layers['head'] = net
            #输出的图片被 缩短/16
            return net
    

    这个函数没有什么太大的问题,把一张图片输入到网络进行特征提取。之后把net输出。net代表了网络的最后一层的输出。

    build_rpn()

    def build_rpn(self, net, is_training, initializer):
    
            # Build anchor component
            # 用來生成九個框的函數
            self._anchor_component()
    
            # Create RPN Layer
            # 首先經過了一個3×3的卷積,之後用1×1的卷積去進行迴歸操作,分出前景或是背景,形成分數值
            rpn = slim.conv2d(net, 512, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3")
    
            self._act_summaries.append(rpn)
            # 分出前景或是背景,形成分數值
            rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score')
    
            # Change it so that the score has 2 as its channel size
            # 分出前景或是背景,形成分數值,未进行运算
            rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
            
            # 进行softmax
            rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
            
            # 
            rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")
            
            # 二分類操作和迴歸操作是並行的,於是用同樣1×1的卷積去操作原來的future map,
            # 生成長度爲4×k,即_num_anchors×4的長度
            rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
            # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分数是否有物体,进行二分类经过了softmax
            # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐标,进行坐标回归
            # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分数是否有物体,进行没有二分类经过了softmax
            # rpn_cls_score_reshape ,是shape=[None, 2]的框分数
            return rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape
    

    build_rpn函数就似乎进行feature map的box的提取,其输出如下:

    1.rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框
    2.的分数是否有物体,进行二分类经过了softmax
    3.rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐标,进行坐标回归
    4.rpn_cls_score, 是shape=[None,512,w,h,self._num_anchors * 2]的框的分数是否有物体,进行没有二分类经过了softmax
    5.rpn_cls_score_reshape ,是shape=[None, 2]的框分数

    注意这个函数在内部调用了_anchor_component(),这个函数用來生成对所有的点生成九个框,一共会生成W x H x 9个框。

    _anchor_component()

    # 
    def _anchor_component(self):
            with tf.variable_scope('ANCHOR_' + 'default'):
                # generate_anchors()產生位置
                # just to get the shape right
                # feat_stride爲原始圖像與這裏圖像的倍數關係,feat_stride在這裏爲16
                #_im_info[0, 0]原始图片的尺寸
                height = tf.to_int32(tf.ceil(self._im_info[0, 0] / np.float32(self._feat_stride[0])))
                width = tf.to_int32(tf.ceil(self._im_info[0, 1] / np.float32(self._feat_stride[0])))
                
                # snippit()中相關代碼
                # 这里产生了所有的图片产生的框,如果feature map大小是 W x H x 9个框,每个框大小已经被映射到原图,
                # 也就是乘上了16
                # 
                anchors, anchor_length = tf.py_func(generate_anchors_pre,
                                                    [height, width,
                                                     self._feat_stride, self._anchor_scales, self._anchor_ratios],
                                                    [tf.float32, tf.int32], name="generate_anchors")
                anchors.set_shape([None, 4])
                anchor_length.set_shape([])
                self._anchors = anchors
                self._anchor_length = anchor_length
    

    在_anchor_component()内部调generate_anchors_pre()这个函数,才是生成所有的框的函数。

    generate_anchors_pre()

    def generate_anchors_pre(height, width, feat_stride, anchor_scales=(8,16,32), anchor_ratios=(0.5,1,2)):
      """ A wrapper function to generate anchors given different scales
        Also return the number of anchors in variable 'length'
      """
      """生成anchor的预处理方法,generate_anchors方法就是直接产生各种大小的anchor box,generate_anchors_pre方法
         是把每一个anchor box对应到原图上
          height = tf.to_int32(tf.ceil(self._im_info[0] / np.float32(self._feat_stride[0])))
          width = tf.to_int32(tf.ceil(self._im_info[1] / np.float32(self._feat_stride[0])))
          feat_stride: 经过VGG或者ZF后特征图相对于原图的在长或者宽上的缩放倍数,也就是说height和width对应于特征图长宽
          anchor_scales:anchor尺寸
          anchor_ratios: anchor长宽比
      """
      # 只有9个框
      anchors = generate_anchors(ratios=np.array(anchor_ratios), scales=np.array(anchor_scales)) # 产生各种大小的anchor box
      A = anchors.shape[0] # anchor的种数
      shift_x = np.arange(0, width) * feat_stride # 特征图相对于原图的偏移
      shift_y = np.arange(0, height) * feat_stride # 特征图相对于原图的偏移
      shift_x, shift_y = np.meshgrid(shift_x, shift_y) # 返回坐标矩阵
      shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), shift_x.ravel(), shift_y.ravel())).transpose()
      K = shifts.shape[0]
      # width changes faster, so here it is H, W, C
      anchors = anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2)) 
      # K x A x 4 想相当与把 anchor box加载featu map上,现在fe'a
      # anchor坐标加上anchor box大小
      # H x W x 9个框
      anchors = anchors.reshape((K * A, 4)).astype(np.float32, copy=False)
      length = np.int32(anchors.shape[0]) 
      return anchors, length
    

    当然,这里面调用了一个函数,就是generate_anchors()函数,generate_anchors()就是对一个点产生固定大小的的框,按照输入的参数,就可以在原图生成9个框了。:
    generate_anchors():

    def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                         scales=2 ** np.arange(3, 6)):
        """
        Generate anchor (reference) windows by enumerating aspect ratios X
        scales wrt a reference (0, 0, 15, 15) window.
        """
    
        base_anchor = np.array([1, 1, base_size, base_size]) - 1
        ratio_anchors = _ratio_enum(base_anchor, ratios)
        anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                             for i in range(ratio_anchors.shape[0])])
        return anchors
    

    这个函数仅仅是对feature map的处理,没有参数的训练,
    这里可以直接test。
    这里在一张1024的图上,产生了9个框,坐标点位是(365,365)

    import time
        import numpy as np
        import cv2
    
        # Create a black image
        
        t = time.time()
        a = generate_anchors()
        print(time.time() - t)
        print(a)
        img = np.zeros((1024,1024,3), np.uint8)
        for i in a:
            i = np.array(i) + 365
            cv2.rectangle(img,(int(i[0]),int(i[1])),(int(i[2]),int(i[3])),(0,255,0),3)
    
        cv2.imshow('line',img)
        cv2.waitKey()
        cv2.waitKey()
    

    这里的框如下:


    上面的坐标为:
    这里的赋值,就是中心(365,365)的偏移值。

    [[ -84.  -40.   99.   55.]
     [-176.  -88.  191.  103.]
     [-360. -184.  375.  199.]
     [ -56.  -56.   71.   71.]
     [-120. -120.  135.  135.]
     [-248. -248.  263.  263.]
     [ -36.  -80.   51.   95.]
     [ -80. -168.   95.  183.]
     [-168. -344.  183.  359.]]
    

    现在可以往回走了。
    回到generate_anchors_pre()吧!

      shift_x, shift_y = np.meshgrid(shift_x, shift_y) # 返回坐标矩阵
      shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), shift_x.ravel(), shift_y.ravel())).transpose()
      K = shifts.shape[0]
      # width changes faster, so here it is H, W, C
      anchors = anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2)) 
      # K x A x 4 想相当与把 anchor box加载featu map上,现在fe'a
      # anchor坐标加上anchor box大小
      # H x W x 9个框
      anchors = anchors.reshape((K * A, 4)).astype(np.float32, copy=False)
      length = np.int32(anchors.shape[0])
    

    这几步操作,就是把对单点的框,扩展到整个feature map,这里的anchors和length,是函数的最终返回值,anchors是shape=[HxWx9,4]的大小,这里的每个点在原图中对应16个点的视野,这里的[2,2]在原图中对应了[32,32]的视野。这里还没有batch size的概念,这里只是对一张feature map产生框。
    再回到_anchor_component():

    anchors.set_shape([None, 4])
    anchor_length.set_shape([])
    self._anchors = anchors
    self._anchor_length = anchor_length
    

    在这里的anchors被设置到([None, 4]),同时也拿到了anchor_length数量,这里是WxHx9.
    再回到build_rpn()
    在构建了框之后,net就经过了[3,3]的卷积,

    rpn = slim.conv2d(net, 512, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3")
    
    

    再经过[1,1]卷积,判断出每个feature map点上是否有物体,这里使用2分类。

    # 分出前景或是背景,形成分數值
    rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score')
    
    

    使用[1,1]卷积,判断出每个feature map点上是否有物体的box坐标,每个坐标包含4个值,左上和右下坐标。

    rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
    
    

    我们再次回到build_network()
    在上一步说到,特征图中每个点的9个框搞定,同时网络给定了在每个点的预测结果(是否为背景),也是每个点预测9个框的分数。每张图片的框时20000个左右,这里的框有点多。接下来,进行训练和预测时,需要挑选合适的框进行预测。
    build_proposals就是构建(选择)合适的框,进行下一步的推断。

    # 筛选框rois,选择合适的框
    rois = self.build_proposals(is_training, rpn_cls_prob, rpn_bbox_pred, rpn_cls_score)
    

    build_proposals()

    def build_proposals(self, is_training, rpn_cls_prob, rpn_bbox_pred, rpn_cls_score):
            # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分数是否有物体,进行二分类经过了softmax
            # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐标,进行坐标回归
            # rpn_cls_score,  是shape=[None,512,w,h,self._num_anchors * 2]的框的分数是否有物体,进行没有二分类经过了softmax
            # 获得合适的roi
            if is_training:
                # 对坐标的操作,rios为筛选出来的合适的框,roi_scores为
                rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
                #篩出來IOU大於70%的框
                rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")
    
                # Try to have a deterministic order for the computing graph, for reproducibility
                with tf.control_dependencies([rpn_labels]):
                    rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")
            else:
                if cfg.FLAGS.test_mode == 'nms':
                    rois, _ = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
                elif cfg.FLAGS.test_mode == 'top':
                    rois, _ = self._proposal_top_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
                else:
                    raise NotImplementedError
            return rois
    

    从代码上就可以看输出,这分为训练和非训练两种情况。WHY,前面说到,训练时这里可以有ground truth,但在非训练的时候没有ground truth。所以这是要区分开来的。
    这里有_proposal_layer()

    rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
    

    _proposal_layer()

    _proposal_layer调用了proposal_layer()那就直接看proposal_layer()

    def proposal_layer(rpn_cls_prob, rpn_bbox_pred, im_info, cfg_key, _feat_stride, anchors, num_anchors):
        # rpn_cls_prob, 是shape=[None, self._num_anchors * 2]的框的分数是否有物体,进行二分类经过了softmax
        # rpn_bbox_pred, 是shape=[None,512,w,h,self._num_anchors * 4] 是框的坐标,进行坐标回归
        """A simplified version compared to fast/er RCNN
           For details please see the technical report
        """
        
        """
            proposal_layer中做的事情:实际上上,在proposal_layer中的任務主要就是篩選合適的框,
            縮小檢測範圍,那麼,在前文回憶部分的步驟⑤中我們已經說到:第一,篩選與ground truth中,
            重疊率大於70%的候選框,篩掉其他的候選框,縮小範圍;第二,用NMS非極大值抑制,
            篩選二分類中前n個score值的候選框;第三,篩掉越界框後,
            再來從前n個從大到小排序的值中篩選一次
        """
        
        if type(cfg_key) == bytes:
            cfg_key = cfg_key.decode('utf-8')
    
        if cfg_key == "TRAIN":
            pre_nms_topN = cfg.FLAGS.rpn_train_pre_nms_top_n
            post_nms_topN = cfg.FLAGS.rpn_train_post_nms_top_n
            nms_thresh = cfg.FLAGS.rpn_train_nms_thresh
        else:
            pre_nms_topN = cfg.FLAGS.rpn_test_pre_nms_top_n
            post_nms_topN = cfg.FLAGS.rpn_test_post_nms_top_n
            nms_thresh = cfg.FLAGS.rpn_test_nms_thresh
    
        im_info = im_info[0]
        # Get the scores and bounding boxes
        scores = rpn_cls_prob[:, :, :, num_anchors:]
        rpn_bbox_pred = rpn_bbox_pred.reshape((-1, 4))
        scores = scores.reshape((-1, 1))
        
        # 先進行了整體平移,再進行了整體縮放,所以,在求出變換因子之後,
        # 求出,pred_ctr_x, pred_ctr_y, pred_w以及pred_h
        proposals = bbox_transform_inv(anchors, rpn_bbox_pred)
        proposals = clip_boxes(proposals, im_info[:2])
    
        # Pick the top region proposals
        order = scores.ravel().argsort()[::-1]
        if pre_nms_topN > 0:
            order = order[:pre_nms_topN]
        proposals = proposals[order, :]
        scores = scores[order]
    
        # Non-maximal suppression
        keep = nms(np.hstack((proposals, scores)), nms_thresh)
    
        # Pick th top region proposals after NMS
        if post_nms_topN > 0:
            keep = keep[:post_nms_topN]
        proposals = proposals[keep, :]
        scores = scores[keep]
    
        # Only support single image as input
        batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
        blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
    
        return blob, scores
    

    bbox_transform_inv()坐标的变换

    bbox_transform_inv函数结合RPN的输出对所有初始框进行了坐标变换

    def bbox_transform_inv(boxes, deltas):
        '''
        Applies deltas to box coordinates to obtain new boxes, as described by 
        deltas
        '''   
        if boxes.shape[0] == 0:
            return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
     
        boxes = boxes.astype(deltas.dtype, copy=False)
        
        #获得初始proposal的中心和长宽信息
        widths = boxes[:, 2] - boxes[:, 0] + 1.0
        heights = boxes[:, 3] - boxes[:, 1] + 1.0
        ctr_x = boxes[:, 0] + 0.5 * widths
        ctr_y = boxes[:, 1] + 0.5 * heights
     
        #获得坐标变换信息
        dx = deltas[:, 0::4]
        dy = deltas[:, 1::4]
        dw = deltas[:, 2::4]
        dh = deltas[:, 3::4]
     
        #得到改变后的proposal的中心和长宽信息
        pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
        pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
        pred_w = np.exp(dw) * widths[:, np.newaxis]
        pred_h = np.exp(dh) * heights[:, np.newaxis]
     
        #将改变后的proposal的中心和长宽信息还原成左上角和右下角的版本
        pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
        # x1
        pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
        # y1
        pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
        # x2
        pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
        # y2
        pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
     
        return pred_boxes
    
    

    如下公式:

    image.png

    使用clip_boxes函数将改变坐标信息后超过图像边界的框的边框裁剪一下,使之在图像边界之内。clip_boxes函数如下所示

    clip_boxes()

    def clip_boxes(boxes, im_shape):
        """
        Clip boxes to image boundaries.
        """
     
        #严格限制proposal的四个角在图像边界内
        # x1 >= 0
        boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
        # y1 >= 0
        boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
        # x2 < im_shape[1]
        boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
        # y2 < im_shape[0]
        boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
        return boxes
    
    

    对所有的框按照前景分数进行排序,选择排序后的前pre_nms_topN和框。

    order = scores.ravel().argsort()[::-1]
     if pre_nms_topN > 0:
          order = order[:pre_nms_topN]
    proposals = proposals[order, :]
    scores = scores[order]
    

    对于上一步选择出来的框,用nms算法根据阈值排除掉重叠的框。

    keep = nms(np.hstack((proposals, scores)), nms_thresh)
    
    

    nms()

    def py_cpu_nms(dets, thresh):
        """Pure Python NMS baseline."""
        x1 = dets[:, 0]
        y1 = dets[:, 1]
        x2 = dets[:, 2]
        y2 = dets[:, 3]
        scores = dets[:, 4]
    
        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
        order = scores.argsort()[::-1]
    
        keep = []
        while order.size > 0:
            i = order[0]
            keep.append(i)
            xx1 = np.maximum(x1[i], x1[order[1:]])
            yy1 = np.maximum(y1[i], y1[order[1:]])
            xx2 = np.minimum(x2[i], x2[order[1:]])
            yy2 = np.minimum(y2[i], y2[order[1:]])
    
            w = np.maximum(0.0, xx2 - xx1 + 1)
            h = np.maximum(0.0, yy2 - yy1 + 1)
            inter = w * h
            ovr = inter / (areas[i] + areas[order[1:]] - inter)
    
            inds = np.where(ovr <= thresh)[0]
            order = order[inds + 1]
    
        return keep
    

    对于剩下的框,选择post_nms_topN个最终的框。

    # Pick th top region proposals after NMS
    if post_nms_topN > 0:
         keep = keep[:post_nms_topN]
    proposals = proposals[keep, :]
    scores = scores[keep]
    

    所有选出的框之后,需要在feature map 上 插入索引,由于batch size为1,因此都插入0。

    batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
    blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
    

    返回build_proposals()中在进行_proposal_layer之后还需要进行正负样本处理,筛选出來IOU大於70%的框

        def _anchor_target_layer(self, rpn_cls_score, name):
            with tf.variable_scope(name):
                rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = tf.py_func(
                    anchor_target_layer,
                    [rpn_cls_score, self._gt_boxes, self._im_info, self._feat_stride, self._anchors, self._num_anchors],
                    [tf.float32, tf.float32, tf.float32, tf.float32])
    
    

    最后返回所有的框坐标,注意:这里的box框的面积大小不定,所有的框还没有统一大小,需要在rois层中进行尺度的换。
    再次回到build_network(),最后就是build_predictions():

    cls_score, cls_prob, bbox_pred = self.build_predictions(net, rois, is_training, initializer, initializer_bbox)
    

    build_predictions()

    def build_predictions(self, net, rois, is_training, initializer, initializer_bbox):
    
            # Crop image ROIs
            # 构建固定大小的rois窗口
            pool5 = self._crop_pool_layer(net, rois, "pool5")
            pool5_flat = slim.flatten(pool5, scope='flatten')
    
            # Fully connected layers
            fc6 = slim.fully_connected(pool5_flat, 4096, scope='fc6')
            if is_training:
                fc6 = slim.dropout(fc6, keep_prob=0.5, is_training=True, scope='dropout6')
    
            fc7 = slim.fully_connected(fc6, 4096, scope='fc7')
            if is_training:
                fc7 = slim.dropout(fc7, keep_prob=0.5, is_training=True, scope='dropout7')
    
            # Scores and predictions
            # 通过fc7进行_num_classes的分类
            cls_score = slim.fully_connected(fc7, self._num_classes, weights_initializer=initializer, trainable=is_training, activation_fn=None, scope='cls_score')
            cls_prob = self._softmax_layer(cls_score, "cls_prob")
            
            # 通过fc7进行box框的分类
            bbox_prediction = slim.fully_connected(fc7, self._num_classes * 4, weights_initializer=initializer_bbox, trainable=is_training, activation_fn=None, scope='bbox_pred')
            # cls_score 进行_num_classes的分类得分
            # cls_prob 进行_num_classes的分类得分,经过softmax
            # bbox_prediction 进行 box的回归
            return cls_score, cls_prob, bbox_prediction
    

    把rois(框的坐标,还未进行尺寸处理,pool5才是固定尺寸的feature map)特征图输入到网络。进行最后的分类和定位
    这里的_crop_pool_layer()函数,就是crop_pool_layer了,利用box框的坐标,在net上找到对应的feature map区域。
    返回的是固定大小的feature map==pool5.

    def _crop_pool_layer(self, bottom, rois, name):
            #固定大小的窗口
            with tf.variable_scope(name):
                batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1])
                # Get the normalized coordinates of bboxes
                bottom_shape = tf.shape(bottom)
                height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
                width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0])
                x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width
                y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height
                x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width
                y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height
                # Won't be backpropagated to rois anyway, but to save time
                bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1))
                pre_pool_size = cfg.FLAGS.roi_pooling_size * 2
                crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops")
    
            return slim.max_pool2d(crops, [2, 2], padding='SAME')
    

    这里pool5被拉平,之后送入fc6----->fc7,这里的fc7是一个公共层。fc7会有俩个输出,一个是进分类,另外一个是进行box的坐标回归。
    分类:

    cls_score = slim.fully_connected(fc7, self._num_classes, weights_initializer=initializer, trainable=is_training, activation_fn=None, scope='cls_score')
            cls_prob = self._softmax_layer(cls_score, "cls_prob")
    

    box回归:

    # 通过fc7进行box框的分类
    bbox_prediction = slim.fully_connected(fc7, self._num_classes * 4, weights_initializer=initializer_bbox, trainable=is_training, activation_fn=None, scope='bbox_pred')
            
    

    到这里所有网络的object detection 网络所干的事就干完了。
    而在train.py中,网络会进行判断是否在TEST和TRIAN。
    TEST的话就结束计算了,而TRIAN还需要进行loss计算

    def _add_losses(self, sigma_rpn=3.0):
            with tf.variable_scope('loss_' + self._tag):
                # RPN, class loss
                rpn_cls_score = tf.reshape(self._predictions['rpn_cls_score_reshape'], [-1, 2])
                rpn_label = tf.reshape(self._anchor_targets['rpn_labels'], [-1])
                rpn_select = tf.where(tf.not_equal(rpn_label, -1))
                rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select), [-1, 2])
                rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select), [-1])
                rpn_cross_entropy = tf.reduce_mean(
                    tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))
    
                # RPN, bbox loss
                rpn_bbox_pred = self._predictions['rpn_bbox_pred']
                rpn_bbox_targets = self._anchor_targets['rpn_bbox_targets']
                rpn_bbox_inside_weights = self._anchor_targets['rpn_bbox_inside_weights']
                rpn_bbox_outside_weights = self._anchor_targets['rpn_bbox_outside_weights']
    
                rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,
                                                    rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])
    
                # RCNN, class loss
                cls_score = self._predictions["cls_score"]
                label = tf.reshape(self._proposal_targets["labels"], [-1])
    
                cross_entropy = tf.reduce_mean(
                    tf.nn.sparse_softmax_cross_entropy_with_logits(
                        logits=tf.reshape(cls_score, [-1, self._num_classes]), labels=label))
    
                # RCNN, bbox loss
                bbox_pred = self._predictions['bbox_pred']
                bbox_targets = self._proposal_targets['bbox_targets']
                bbox_inside_weights = self._proposal_targets['bbox_inside_weights']
                bbox_outside_weights = self._proposal_targets['bbox_outside_weights']
    
                loss_box = self._smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights)
    
                self._losses['cross_entropy'] = cross_entropy
                self._losses['loss_box'] = loss_box
                self._losses['rpn_cross_entropy'] = rpn_cross_entropy
                self._losses['rpn_loss_box'] = rpn_loss_box
    
                loss = cross_entropy + loss_box + rpn_cross_entropy + rpn_loss_box
                self._losses['total_loss'] = loss
    
                self._event_summaries.update(self._losses)
    
            return loss
    

    从整个网路进行分析,可以发现网络有四个输出。分别是RPN box 和RPBN class ,RCNN box 和 RCNN class。box使用的是回归损失,class使用的是交叉熵损失。把所有的loss进行相加,可以进行联合训练。

    2018.10.25 更新

    在train.py这里的代码好像和论文不一样,毕竟不是原作者的写的。这里的模型其实是RPN网络与Fast RNN直接进行联合训练。如下:
    train_op就是集合所有的loss,没有分阶段训练。

    layers = self.net.create_architecture(sess, "TRAIN", self.imdb.num_classes, tag='default')
                loss = layers['total_loss']
                lr = tf.Variable(cfg.FLAGS.learning_rate, trainable=False)
                momentum = cfg.FLAGS.momentum
                optimizer = tf.train.MomentumOptimizer(lr, momentum)
    
                gvs = optimizer.compute_gradients(loss)
    
                # Double bias
                # Double the gradient of the bias if set
                if cfg.FLAGS.double_bias:
                    final_gvs = []
                    with tf.variable_scope('Gradient_Mult'):
                        for grad, var in gvs:
                            scale = 1.
                            if cfg.FLAGS.double_bias and '/biases:' in var.name:
                                scale *= 2.
                            if not np.allclose(scale, 1.0):
                                grad = tf.multiply(grad, scale)
                            final_gvs.append((grad, var))
                    train_op = optimizer.apply_gradients(final_gvs)
                else:
                    train_op = optimizer.apply_gradients(gvs)
    
                    ....................................................
    
       
                rpn_loss_cls, rpn_loss_box, loss_cls, loss_box, total_loss = self.net.train_step(sess, blobs, train_op)
    

    再次都这

    参考:
    详细的Faster R-CNN源码解析之proposal_layer和proposal_target_layer源码解析
    基于Tensorflow的目标检测(Detection)的代码案例详解

    相关文章

      网友评论

          本文标题:Faster R-CNN 入坑之源码阅读

          本文链接:https://www.haomeiwen.com/subject/bdguaftx.html