TensorFlow 同时导入多个预训练模型进行 finetun

作者: 公输睚信 | 来源:发表于2018-11-28 21:19 被阅读25次

TensorFlow 同时导入多个预训练模型进行 finetun
2020-05-18 神经网络保存训练模型
TensorFlow 调用预训练好的模型—— Python 实现
tensorflow初探八之加载数据-Tensorflow技术解
Introduction to TensorFlow for A
tensorflow同时载入多个模型
Bert基础介绍
用浏览器训练Tensorflow.js模型的18个技巧（上）
Tensorflow(三) 通过预训练模型进行图像分类
【NLP】BERT将预训练tensorflow模型转换为pyto

这篇文章将说明怎么同时导入多个预训练模型进行训练。

前面的文章 TensorFlow 使用预训练模型 ResNet-50 介绍了怎么导入一个单模型预训练参数对模型进行 finetune，但对一些复杂的任务，可能需要对多个模型进行组合，比如如下的模型并行：

双模型并行
或者模型级联：

双模型级联
这个时候就需要一次导入多个预训练模型参数，然后进行训练。

现在来看多模型并行的情况（多模型级联一样），以双模型并行为例。仍然沿用文章 TensorFlow 使用预训练模型 ResNet-50 的代码，首先定义模型结构，只需要修改 model.py 中的 predict 函数（以 ResNet-50 和 VGG-16 双模型为例）：

    def predict(self, preprocessed_inputs):
        """Predict prediction tensors from inputs tensor.
        
        Outputs of this function can be passed to loss or postprocess functions.
        
        Args:
            preprocessed_inputs: A float32 tensor with shape [batch_size,
                height, width, num_channels] representing a batch of images.
            
        Returns:
            prediction_dict: A dictionary holding prediction tensors to be
                passed to the Loss or Postprocess functions.
        """
        # ResNet-50
        with slim.arg_scope(nets.resnet_v1.resnet_arg_scope()):
            net_resnet, _ = nets.resnet_v1.resnet_v1_50(
                preprocessed_inputs, num_classes=self.num_classes,
                is_training=self._is_training)
            net_resnet = tf.squeeze(net_resnet, axis=[1, 2])
            
        # VGG-16
        with slim.arg_scope(nets.vgg.vgg_arg_scope()):
            net_vgg, _ = nets.vgg.vgg_16(
                preprocessed_inputs, num_classes=self.num_classes,
                is_training=self._is_training)
            
        logits = tf.add(net_resnet, net_vgg)
        prediction_dict = {'logits': logits}
        return prediction_dict

然后在项目中添加如下文件（命名为：model_utils.py）：

# -*- coding: utf-8 -*-
"""
Created on Thu Nov 29 11:36:07 2018

@author: shirhe-lyh


Modified from:
    1.https://github.com/tensorflow/models/blob/master/research/maskgan/
        model_utils/model_utils.py
    2.https://github.com/tensorflow/models/blob/master/research/maskgan/
        train_mask_gan.py
"""

import tensorflow as tf

flags = tf.app.flags

FLAGS = flags.FLAGS


def retrieve_init_savers(var_scopes_dict=None, 
                         checkpoint_exclude_scopes_dict=None):
    """Retrieve a dictionary of all the initial savers for the models.
    
    Args:
        var_scopes_dict: A dictionary of variable scopes for the models.
        checkpoint_exclude_scopes_dict: A dictionary of comma-separated list of 
            scopes of variables to exclude when restoring from a checkpoint.
        
    Returns:
        A dictionary of init savers.
    """
    if var_scopes_dict is None:
        return None
    
    
    # Dictionary of init savers
    init_savers = {}
    for key, scope in var_scopes_dict.items():
        trainable_vars = [
            v for v in tf.trainable_variables() if v.op.name.startswith(scope)]
        
        exclusions = []
        checkpoint_exclude_scopes = checkpoint_exclude_scopes_dict.get(
            key, None)
        if checkpoint_exclude_scopes:
            exclusions = [scope.strip() for scope in 
                         checkpoint_exclude_scopes.split(',')]
        variables_to_restore = []
        for var in trainable_vars:
            excluded = False
            for exclusion in exclusions:
                if var.op.name.startswith(exclusion):
                    excluded = True
            if not excluded:
                variables_to_restore.append(var)
        
        init_saver = tf.train.Saver(var_list=variables_to_restore)
        init_savers[key] = init_saver
    return init_savers


def init_fn(init_savers, sess):
    """The init_fn to be passed to the Supervisor.
    
    Args:
        init_savers: Dictionary of init_savers in the format:
            'init_saver_name': init_saver.
        sess: A TensorFlow Session object.
    """
    # Load the weights for ResNet
    if FLAGS.resnet_ckpt:
        print('Restoring checkpoint from %s.' % FLAGS.resnet_ckpt)
        tf.logging.info('Restoring checkpoint from %s.' % FLAGS.resnet_ckpt)
        resnet_init_saver = init_savers['ResNet']
        resnet_init_saver.restore(sess, FLAGS.resnet_ckpt)
        
    # Load the weights for VGG
    if FLAGS.vgg_ckpt:
        print('Restoring checkpoint from %s.' % FLAGS.vgg_ckpt)
        tf.logging.info('Restoring checkpoint from %s.' % FLAGS.vgg_ckpt)
        vgg_init_saver = init_savers['VGG']
        vgg_init_saver.restore(sess, FLAGS.vgg_ckpt)
        
    if FLAGS.resnet_ckpt is None and FLAGS.vgg_ckpt is None:
        return None

最后，用如下代码替换 train.py 中的 get_init_fn 函数（需要导入 model_utils.py）：

def get_init_fn():
    """Returns a function run by che chief worker to warm-start the training.
    
    Returns:
        An init function run by the supervisor.
    """
    var_scopes_dict = {'ResNet': 'resnet_v1_50',
                       'VGG': 'vgg_16'}
    checkpoint_exclude_scopes_dict = {'ResNet': 'resnet_v1_50/logits',
                                      'VGG': 'vgg_16/fc8'}
    init_savers = model_utils.retrieve_init_savers(
        var_scopes_dict, checkpoint_exclude_scopes_dict)
    init_fn = partial(model_utils.init_fn, init_savers)
    return init_fn

其它代码照旧就可以了（此时，batch_size 需要调小才能在 1080Ti 上训练）。

一次性导入多个预训练模型参数的思路非常简单，首先根据模型变量的命名空间，比如 ResNet-50 的命名空间 resnet_v1_50 以及 VGG-16 的命名空间 vgg_16，借助函数 tf.trainable_variables() 将相应命名空间中的可训练变量列表找出来（同时排除掉不需要的预训练参数）；接着就可以用语句 tf.train.Saver(var_list=variables_to_restore) 定义模型保存的实例，然后用这些实例的 restore 函数将预训练参数逐个恢复。