美文网首页
FPN源码 代码调试

FPN源码 代码调试

作者: yanghedada | 来源:发表于2018-11-26 16:09 被阅读205次

    由于特属原因,需要调试这里的代码。。。

    代码地址
    原始代码
    与作者的讨论
    代码是在ubuntu上面调试的

    目录:


    制作数据 voc

    使用convert_data_to_tfrecord.py,制作tfrecord数据。

    tf.app.flags.DEFINE_string('VOC_dir', 'data/{}/'.format(cfgs.DATASET_NAME), 'Voc dir')
    tf.app.flags.DEFINE_string('xml_dir', 'Annotations', 'xml dir')
    tf.app.flags.DEFINE_string('image_dir', 'JPEGImages', 'image dir')
    tf.app.flags.DEFINE_string('save_name', 'train', 'save name')
    tf.app.flags.DEFINE_string('save_dir', cfgs.ROOT_PATH + '/data/tfrecords/', 'save name')
    tf.app.flags.DEFINE_string('img_format', '.jpg', 'format of image')
    

    这里的参数就是数据目录
    pascal就是存放数据的地方

    • 修改141行
     'img_name': _bytes_feature(img_name.encode('utf-8')),
    

    执行如下,其他地方我没改

    python convert_data_to_tfrecord.py
    

    得到train.tfrecod

    把数据分成两份就可以得到test和train数据

    开始trian

    VOC 的数据只有20分类。
    修改改config.py,如下。文件默认和voc的不一样。

    NET_NAME = 'resnet_v1_101'
    DATASET_NAME = 'pascal'
    VERSION = 'v1_{}'.format(DATASET_NAME)
    CLASS_NUM = 20  # 700 exclude background
    
    • 无预训练模型训练

    修改train.py,注释掉如下部分开始训练

        #restorer, restore_ckpt = restore_model.get_restorer(test=False)
        saver = tf.train.Saver(max_to_keep=3)
    
        config = tf.ConfigProto()
        # config.gpu_options.per_process_gpu_memory_fraction = 0.5
        config.gpu_options.allow_growth = True
        with tf.Session(config=config) as sess:
          sess.run(init_op)
          # if not restorer is None:
          #   restorer.restore(sess, restore_ckpt)
          #   print('restore model')
    

    在trian.py修改保存路径

          summary_path = os.path.join('../output/{}'.format(cfgs.DATASET_NAME),
                                      FLAGS.summary_path, cfgs.VERSION)
    
    ........................
    
    
              save_dir = os.path.join('../output/{}'.format(cfgs.DATASET_NAME),
                                      FLAGS.trained_checkpoint, cfgs.VERSION)
    

    这时候就会在根目录得到output的目录。

    训练

    python train.py
    

    如下:

    得到训练参数:

    loss图:

    • 从预训练参数开始训练

    修改config_res101.py的参数

    tf.app.flags.DEFINE_string(
        'pretrained_model_path',
        '../output/pascal/res101_trained_weights/v1_pascal/pascal_500model.ckpt',
        #YangHe_MyCode/FPN_TensorFlow-master/output/pascal/res101_trained_weights
        # 'output-1/res101_trained_weights/v1_layer/voc_50000model.ckpt',
        'the path of pretrained weights'
    )
    

    同时,把前面注释的代码还原:

        restorer, restore_ckpt = restore_model.get_restorer(test=False)
        saver = tf.train.Saver(max_to_keep=3)
    
        config = tf.ConfigProto()
        # config.gpu_options.per_process_gpu_memory_fraction = 0.5
        config.gpu_options.allow_growth = True
        with tf.Session(config=config) as sess:
          sess.run(init_op)
          if not restorer is None:
            restorer.restore(sess, restore_ckpt)
            print('restore model')
    

    就可以训练了

    python train.py
    

    evla.py

    对模型进行评估,这里的没法在命令行参数输入(本人太Low),So。。。
    只能设置默认参数。。

    如下:

    注意:这里使用的数据就是刚才得到pascal_test.tfrecord
    OK了。

    这里有两个参数没用我就删掉了,src_folder,des_folder被我删了。

      parser = argparse.ArgumentParser(description='Evaluate a trained FPN model')
      parser.add_argument('--weights', dest='weights',
                          help='model path',
                          default='../output/pascal/res101_trained_weights/v1_pascal/pascal_500model.ckpt',
                          type=str)
      parser.add_argument('--img_num', dest='img_num',
                          help='image numbers',
                          default=20, type=int)
    

    这里需要设置预训练参数。。。。
    修改pickle的读写吧,这里有BUG。

    • eval.py的pickle写
    • eval.py的pickle读
    
      fr1 = open('predict_dict.pkl', 'rb')
      fr2 = open('gtboxes_dict.pkl', 'rb')
    
      predict_dict = pickle.load(fr1,encoding='iso-8859-1')
      gtboxes_dict = pickle.load(fr2,encoding='iso-8859-1')
    

    再修改:

    这里加一个if len(rboxes) != 0的判断,防止rec[-1]报错.

    执行这里eval.py

    python eval.py
    

    这里会得到两个文件gtboxes_dict.pkl和predict_dict.pkl。

    这里得到map

    来个mAP大写:

    你没看错,这里mAP =0。。。。。。

    出错

    2018-11-27 09:33:31: step2050 image_name:b'2008_008546.jpg'
                         rpn_loc_loss:0.2765 | rpn_cla_loss:0.2248 | rpn_total_loss:0.5013
                         fast_rcnn_loc_loss:0.0773 | fast_rcnn_cla_loss:0.2025 | fast_rcnn_total_loss:0.2797
                         added_loss:0.7811 | total_loss:6.3684 | pre_cost_time:0.3456s
    2018-11-27 09:33:48: step2100 image_name:b'2008_008745.jpg'
                         rpn_loc_loss:0.2128 | rpn_cla_loss:0.1975 | rpn_total_loss:0.4103
                         fast_rcnn_loc_loss:0.1009 | fast_rcnn_cla_loss:0.0907 | fast_rcnn_total_loss:0.1916
                         added_loss:0.6019 | total_loss:6.1892 | pre_cost_time:0.3453s
    2018-11-27 09:34:10: step2150 image_name:b'2009_000161.jpg'
                         rpn_loc_loss:nan | rpn_cla_loss:0.6903 | rpn_total_loss:nan
                         fast_rcnn_loc_loss:nan | fast_rcnn_cla_loss:3.0358 | fast_rcnn_total_loss:nan
                         added_loss:nan | total_loss:10000000000.0000 | pre_cost_time:0.3478s
    2018-11-27 09:34:29: step2200 image_name:b'2009_000393.jpg'
                         rpn_loc_loss:nan | rpn_cla_loss:0.6902 | rpn_total_loss:nan
                         fast_rcnn_loc_loss:nan | fast_rcnn_cla_loss:3.0358 | fast_rcnn_total_loss:nan
                         added_loss:nan | total_loss:10000000000.0000 | pre_cost_time:0.3507s
    
    

    loss出现Nan,是因为这里的loc_loss有错误吧

    修改loss

    def l1_smooth_losses(predict_boxes, gtboxes, object_weights, classes_weights=None):
      '''
      :param predict_boxes: [minibatch_size, -1]
      :param gtboxes: [minibatch_size, -1]
      :param object_weights: [minibatch_size, ]. 1.0 represent object, 0.0 represent others(ignored or background)
      :return:
      '''
      diff = predict_boxes - gtboxes
      abs_diff = tf.cast(tf.abs(diff), tf.float32)
    
      if classes_weights is None:
        '''
        first_stage:
        predict_boxes :[minibatch_size, 4]
        gtboxes: [minibatchs_size, 4]
        '''
        anchorwise_smooth_l1norm = tf.reduce_sum(
            tf.where(tf.less(abs_diff, 1), 0.5 * tf.square(abs_diff), abs_diff - 0.5), axis=1) * object_weights
      else:
        '''
        fast_rcnn:
        predict_boxes: [minibatch_size, 4*num_classes]
        gtboxes: [minibatch_size, 4*num_classes]
        classes_weights : [minibatch_size, 4*num_classes]
        '''
        anchorwise_smooth_l1norm = tf.reduce_sum(tf.where(tf.less(abs_diff, 1), 0.5*tf.square(
            abs_diff)*classes_weights, (abs_diff - 0.5)*classes_weights), axis=1)*object_weights
      anchorwise_smooth_l1norm = tf.clip_by_value(anchorwise_smooth_l1norm, 1e-10, 1e10)
      return tf.reduce_mean(anchorwise_smooth_l1norm, axis=0)  # reduce mean
    

    就是加这句:

    tf.clip_by_value(anchorwise_smooth_l1norm, 1e-10, 1e10)
      return tf.reduce_mean(anchorwise_smooth_l1norm, axis=0)
    

    虽然signed integer is less than minimum

    2018-11-27 11:00:27: step1150 image_name:b'2008_004903.jpg'
                         rpn_loc_loss:0.1078 | rpn_cla_loss:0.1171 | rpn_total_loss:0.2249
                         fast_rcnn_loc_loss:0.0625 | fast_rcnn_cla_loss:0.0605 | fast_rcnn_total_loss:0.1230
                         added_loss:0.3479 | total_loss:5.9376 | pre_cost_time:0.4136s
    signed integer is less than minimum
    signed integer is less than minimum
    2018-11-27 11:00:49: step1200 image_name:b'2008_005101.jpg'
                         rpn_loc_loss:0.3705 | rpn_cla_loss:0.2455 | rpn_total_loss:0.6160
                         fast_rcnn_loc_loss:0.0000 | fast_rcnn_cla_loss:0.0069 | fast_rcnn_total_loss:0.0069
                         added_loss:0.6230 | total_loss:6.2127 | pre_cost_time:0.3253s
    2018-11-27 11:01:11: step1250 image_name:b'2008_005321.jpg'
                         rpn_loc_loss:0.0295 | rpn_cla_loss:0.0396 | rpn_total_loss:0.0692
                         fast_rcnn_loc_loss:0.0170 | fast_rcnn_cla_loss:0.0240 | fast_rcnn_total_loss:0.0410
                         added_loss:0.1102 | total_loss:5.6999 | pre_cost_time:0.3905s
    2018-11-27 11:01:33: step1300 image_name:b'2008_005514.jpg'
                         rpn_loc_loss:0.0564 | rpn_cla_loss:0.0632 | rpn_total_loss:0.1196
                         fast_rcnn_loc_loss:0.0000 | fast_rcnn_cla_loss:0.0057 | fast_rcnn_total_loss:0.0057
                         added_loss:0.1253 | total_los
    

    改成这样

    def l1_smooth_losses(predict_boxes, gtboxes, object_weights, classes_weights=None):
      '''
      :param predict_boxes: [minibatch_size, -1]
      :param gtboxes: [minibatch_size, -1]
      :param object_weights: [minibatch_size, ]. 1.0 represent object, 0.0 represent others(ignored or background)
      :return:
      '''
      diff = predict_boxes - gtboxes
      abs_diff = tf.cast(tf.abs(diff), tf.float32)
      abs_diff = tf.clip_by_value(abs_diff, 1e-5, 1e2)  # clip
      if classes_weights is None:
        '''
        first_stage:
        predict_boxes :[minibatch_size, 4]
        gtboxes: [minibatchs_size, 4]
        '''
        anchorwise_smooth_l1norm = tf.reduce_sum(
            tf.where(tf.less(abs_diff, 1), 0.5 * tf.square(abs_diff), abs_diff - 0.5), axis=1) * object_weights
      else:
        '''
        fast_rcnn:
        predict_boxes: [minibatch_size, 4*num_classes]
        gtboxes: [minibatch_size, 4*num_classes]
        classes_weights : [minibatch_size, 4*num_classes]
        '''
        anchorwise_smooth_l1norm = tf.reduce_sum(tf.where(tf.less(abs_diff, 1), 0.5*tf.square(
            abs_diff)*classes_weights, (abs_diff - 0.5)*classes_weights), axis=1)*object_weights
    
      return tf.reduce_mean(anchorwise_smooth_l1norm, axis=0)  # reduce mean
    

    这样至少不会报错

    在检查这里的数据的时候发现这里的数据制作出错了

    训练数据完全标错了

    修改:

    # -*- coding: utf-8 -*-
    from __future__ import division, print_function, absolute_import
    import sys
    sys.path.append('../../')
    import xml.etree.cElementTree as ET
    import numpy as np
    import tensorflow as tf
    import glob
    import cv2
    from help_utils.tools import *
    from libs.label_name_dict.label_dict import *
    from lxml import etree
    '''
    python convert_data_to_tfrecord.py --VOC_dir=VOCdevkit_train/ --save_name=train  --dataset=pascal --dataset=pascal
    '''
    
    tf.app.flags.DEFINE_string('VOC_dir', None, 'Voc dir')
    tf.app.flags.DEFINE_string('xml_dir', 'Annotations', 'xml dir')
    tf.app.flags.DEFINE_string('image_dir', 'JPEGImages', 'image dir')
    tf.app.flags.DEFINE_string('save_name', 'train', 'save name')
    tf.app.flags.DEFINE_string('save_dir', cfgs.ROOT_PATH + '/data/tfrecords/', 'save name')
    tf.app.flags.DEFINE_string('img_format', '.jpg', 'format of image')
    tf.app.flags.DEFINE_string('dataset', 'car', 'dataset')
    FLAGS = tf.app.flags.FLAGS
    
    
    def _int64_feature(value):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
    
    
    def _bytes_feature(value):
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
    
    
    
    def recursive_parse_xml_to_dict(xml):
      """Recursively parses XML contents to python dict.
    
      We assume that `object` tags are the only ones that can appear
      multiple times at the same level of a tree.
    
      Args:
        xml: xml tree obtained by parsing XML file contents using lxml.etree
    
      Returns:
        Python dictionary holding XML contents.
      """
      if not xml:
        return {xml.tag: xml.text}
      result = {}
      for child in xml:
        child_result = recursive_parse_xml_to_dict(child)
        if child.tag != 'object':
          result[child.tag] = child_result[child.tag]
        else:
          if child.tag not in result:
            result[child.tag] = []
          result[child.tag].append(child_result[child.tag])
      return {xml.tag: result}
    
    
    
    def read_xml_gtbox_and_label(xml_path):
    
        """
        :param xml_path: the path of voc xml
        :return: a list contains gtboxes and labels, shape is [num_of_gtboxes, 5],
               and has [xmin, ymin, xmax, ymax, label] in a per row
        """
        box_list = []
        with open(xml_path,) as f:
            xml_str = f.read()
            #show_all_image_test()
            xml = etree.fromstring(xml_str)
            data = recursive_parse_xml_to_dict(xml)['annotation']
        img_width = int(data['size']['width'])
        img_height = int(data['size']['height'])
    
        for obj in data['object']:
            xmin = int(obj['bndbox']['xmin'])
            ymin = int(obj['bndbox']['ymin'])
            ymax = int(obj['bndbox']['ymax'])
            xmax = int(obj['bndbox']['xmax'])
            label = NAME_LABEL_MAP[obj['name']]
            box_list.append([ymin, xmin, ymax, xmax, label])
        gtbox_label = np.array(box_list, dtype=np.int32)
        ymin, xmin, ymax, xmax, label = gtbox_label[:, 0], gtbox_label[:, 1], gtbox_label[:, 2], gtbox_label[:, 3], \
                                        gtbox_label[:, 4]
        xmin = np.where(xmin <= 0, 0, xmin)
        ymin = np.where(ymin <= 0, 0, ymin)
        xmax = np.where(xmax >= img_width, img_width , xmax)
        ymax = np.where(ymax >= img_height, img_height, ymax)
        gtbox_label = np.transpose(np.stack([ymin, xmin, ymax, xmax, label], axis=0))  # [ymin, xmin, ymax, xmax, label]
    
        return img_height, img_width, gtbox_label
    
    
    def convert_pascal_to_tfrecord():
        xml_path = FLAGS.VOC_dir + FLAGS.xml_dir
        image_path = FLAGS.VOC_dir + FLAGS.image_dir
        save_path = FLAGS.save_dir + FLAGS.dataset + '_' + FLAGS.save_name + '.tfrecord'
        mkdir(FLAGS.save_dir)
    
        # writer_options = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.ZLIB)
        # writer = tf.python_io.TFRecordWriter(path=save_path, options=writer_options)
        writer = tf.python_io.TFRecordWriter(path=save_path)
    
        for count, xml in enumerate(glob.glob(xml_path + '/*.xml')):
            # to avoid path error in different development platform
            xml = xml.replace('\\', '/')
    
            img_name = xml.split('/')[-1].split('.')[0] + FLAGS.img_format
            img_path = image_path + '/' + img_name
    
            if not os.path.exists(img_path):
                print('{} is not exist!'.format(img_path))
                continue
    
            img_height, img_width, gtbox_label = read_xml_gtbox_and_label(xml)
            # img = np.array(Image.open(img_path))
            img = cv2.imread(img_path)
    
            feature = tf.train.Features(feature={
                # maybe do not need encode() in linux
                'img_name': _bytes_feature(img_name.encode('utf8')),
                'img_height': _int64_feature(img_height),
                'img_width': _int64_feature(img_width),
                'img': _bytes_feature(img.tostring()),
                'gtboxes_and_label': _bytes_feature(gtbox_label.tostring()),
                'num_objects': _int64_feature(gtbox_label.shape[0])
            })
    
            example = tf.train.Example(features=feature)
    
            writer.write(example.SerializeToString())
    
            view_bar('Conversion progress', count + 1, len(glob.glob(xml_path + '/*.xml')))
    
        print('\nConversion is complete!')
    
    
    def show_all_image_test():
        NAME_LABEL = list(NAME_LABEL_MAP.keys())
        xml_path = 'VOCdevkit_train/' + FLAGS.xml_dir
        image_path = 'VOCdevkit_train/'+ FLAGS.image_dir
        for count, xml in enumerate(glob.glob(xml_path + '/*.xml')):
            # to avoid path error in different development platform
            xml = xml.replace('\\', '/')
    
            img_name = xml.split('/')[-1].split('.')[0] + FLAGS.img_format
            img_path = image_path + '/' + img_name
    
            if not os.path.exists(img_path):
                print('{} is not exist!'.format(img_path))
                continue
    
            img_height, img_width, gtbox_label = read_xml_gtbox_and_label(xml)
            image = cv2.imread(img_path)
            for i in range(len(gtbox_label)):
                object = gtbox_label[i]
                ymin, xmin, ymax, xmax, label = object
                image = cv2.rectangle(image, (object[1], object[0]),
                                       (object[3], object[2]),
                                       color=(0, 255, 0))
                cv2.putText(image,
                            text=str(len(gtbox_label)),
                            org=((image.shape[1]) // 2, (image.shape[0]) // 2),
                            fontFace=3,
                            fontScale=1,
                            color=(255, 0, 0))
                if ymin <= 0 or  xmin <= 0 or ymax >= img_height or xmax>=img_width:
                    cv2.putText(image,
                                text='error',
                                org=((image.shape[1]) // 2, (image.shape[0]) // 2),
                                fontFace=3,
                                fontScale=1,
                                color=(255, 0, 0))
                else:
                    cv2.putText(image,
                                text=str(NAME_LABEL[object[4]]),
                                org=(object[1], object[0] + 10),
                                fontFace=1,
                                fontScale=1,
                                thickness=2,
                                color=(255, 0, 0))
                cv2.imshow("s", image)
                cv2.waitKey(500)
    if __name__ == '__main__':
        # xml_path = 'VOCdevkit_test/Annotations/2008_000082.xml'
        # print(read_xml_gtbox_and_label(xml_path))  # show xml
        show_all_image_test() # show label and image in plt
    
        #convert_pascal_to_tfrecord() # craete the datasets
    
    

    更新

    作者的 mAP的代码也是比完整。来源是facebookresearch的评估方法。github。所以不用担心mAP不够官方了。
    这里有一份关于mAP代码解释

    作者推荐的faster rcnn 调试

    faster rcnn 调试的问题

    更新

    今天,作者更新了FPN的代码,代码的风格和Faster R-CNN 一脉相承。
    FPN_Tensorflow

    这里的数据制作与原来的数据一点点区别:

        gtbox_label = np.transpose(np.stack([xmin,ymin,xmax,ymax, label], axis=0))
        # FPN  old is [ymin, xmin, ymax, xmax, label]
        # FPN new  is [xmin,ymin,xmax,ymax, label]
        # Faster rcnn [xmin,ymin,xmax,ymax, label]
        return img_height, img_width, gtbox_label
    

    新版的数据格式FPN 和 Faster rcnn一样的。

    VOC提交结果

    这是使用预训练数据进行训练之后的一个评估系统。

    http://host.robots.ox.ac.uk/anonymous/RXNDLK.html

    效果很好的

    参考:

    目标检测:损失函数之SmoothL1Loss

    相关文章

      网友评论

          本文标题:FPN源码 代码调试

          本文链接:https://www.haomeiwen.com/subject/vbvbqqtx.html