美文网首页
风格迁移模型架构优化

风格迁移模型架构优化

作者: nonoka | 来源:发表于2019-01-23 14:31 被阅读0次

        在移动端部署时,比较成熟的方案是使用单模型单风格(Per-Style-Per-model,PSPM)的模型,因为这类模型只有在训练时依赖VGG16,在部署时可以丢弃VGG16,只需部署单独的生成网络即可。

        在Neural-Style-Transfer-Papers这个仓库中统计了许多图像风格迁移领域的论文,并按照单模型单风格、单模型多风格、单模型任意风格进行了分类,而在其中,单模型单风格的论文大致有4片,并且提供了实现的代码,接下来的工作是阅读所有的代码,对比其中的生成网络的架构,寻找优化的方向与可能性。

1. Perceptual Losses for Real-Time Style Transfer and Super-Resolution

        第一篇论文来自斯坦福大学的Justin Johnson、Alexandre Alahi以及Li Fei-Fei,论文地址为[Paper] (ECCV 2016),在Neural-Style-Transfer-Papers仓库中,收录了这篇论文的三种实现,这里主要看其TensorFlow实现。

        其图像生成网络结构如下,其中preds的写法不是很明白:

def net(image):
    conv1 = _conv_layer(image, 32, 9, 1)
    conv2 = _conv_layer(conv1, 64, 3, 2)
    conv3 = _conv_layer(conv2, 128, 3, 2)
    resid1 = _residual_block(conv3, 3)
    resid2 = _residual_block(resid1, 3)
    resid3 = _residual_block(resid2, 3)
    resid4 = _residual_block(resid3, 3)
    resid5 = _residual_block(resid4, 3)
    conv_t1 = _conv_tranpose_layer(resid5, 64, 3, 2)
    conv_t2 = _conv_tranpose_layer(conv_t1, 32, 3, 2)
    conv_t3 = _conv_layer(conv_t2, 3, 9, 1, relu=False)
    preds = tf.nn.tanh(conv_t3) * 150 + 255./2
    return preds

        这是卷积层,和其他网络中的实现差不多:

def _conv_layer(net, num_filters, filter_size, strides, relu=True):
    weights_init = _conv_init_vars(net, num_filters, filter_size)
    strides_shape = [1, strides, strides, 1]
    net = tf.nn.conv2d(net, weights_init, strides_shape, padding='SAME')
    net = _instance_norm(net)
    if relu:
        net = tf.nn.relu(net)
    return net

        这是反卷积层,没有使用图像差值,应该会有棋盘效应:

def _conv_tranpose_layer(net, num_filters, filter_size, strides):
    weights_init = _conv_init_vars(net, num_filters, filter_size, transpose=True)
    batch_size, rows, cols, in_channels = [i.value for i in net.get_shape()]
    new_rows, new_cols = int(rows * strides), int(cols * strides)
    new_shape = [batch_size, new_rows, new_cols, num_filters]
    tf_shape = tf.stack(new_shape)
    strides_shape = [1,strides,strides,1]
    net = tf.nn.conv2d_transpose(net, weights_init, tf_shape, strides_shape, padding='SAME')
    net = _instance_norm(net)
    return tf.nn.relu(net)

        这是残差块的实现,由两层卷积层组成:

def _residual_block(net, filter_size=3):
    tmp = _conv_layer(net, 128, filter_size, 1)
    return net + _conv_layer(tmp, 128, filter_size, 1, relu=False)

        这是instance normolization的实现:

def _instance_norm(net, train=True):
    batch, rows, cols, channels = [i.value for i in net.get_shape()]
    var_shape = [channels]
    mu, sigma_sq = tf.nn.moments(net, [1,2], keep_dims=True)
    shift = tf.Variable(tf.zeros(var_shape))
    scale = tf.Variable(tf.ones(var_shape))
    epsilon = 1e-3
    normalized = (net-mu)/(sigma_sq + epsilon)**(.5)
    return scale * normalized + shift

2. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images

        第二篇论文来自Prisma team,是对原始论文、祖师爷Gatys的A Neural Algorithm of Artistic Style的改进,地址为[Paper] (ICML 2016),在Neural-Style-Transfer-Papers仓库中,收录了这篇论文的Torch和TensorFlow实现,这里主要看其TensorFlow实现。

        这篇论文和前面一篇差不多,都是将原来的求解全局最优解问题转换成用前向网络逼近最优解,Gatys的方法每次要将一幅内容图进行风格转换,就要进行不断的迭代,而这里列举的两篇论文都是先训练得到前向生成网络,以后再来一张内容图,直接输入到生成网络中,即可得到具有预先训练的风格的内容图。

        其图像生成网络的架构如下,和第一篇论文的实现代码一样:

def network(input_image):
    ops = {}
    image = tf.placeholder(tf.float32, shape=None, name='image-placeholder')

    ops['preprocessing'] = tf.div(image, 255)
    ops['preprocessing'] = tf.expand_dims( ops['preprocessing'], 0)
    ops['pad_2'] = pad(ops['preprocessing'], 4)
    
    ops['conv_3'] = conv(ops['pad_2'], [1, 1, 1, 1], [9, 9, 3, 32])
    ops['norm_4'] = norm(ops['conv_3'], [32])
    ops['relu_5'] = tf.nn.relu( ops['norm_4'])

    ops['conv_6'] = conv(ops['relu_5'], [1, 2, 2, 1], [3, 3, 32, 64])
    ops['norm_7'] = norm(ops['conv_6'], [64])
    ops['relu_8'] = tf.nn.relu(ops['norm_7'])

    ops['conv_9'] = conv(ops['relu_8'], [1, 2, 2, 1], [3, 3, 64, 128])
    ops['norm_10'] = norm(ops['conv_9'], [128])
    ops['relu_11'] = tf.nn.relu(ops['norm_10'])

    ops['res_block_11'] = ops['relu_11']
    for i in range(12, 17):
        ops['res_block_' + str(i)] = res_block(ops['res_block_' + str(i-1)])
    
    ops['conv_transpose_17'] = conv_transpose(
        ops['res_block_16'], [1, 2, 2, 1], [3, 3, 64, 128], ops['conv_6'])
    ops['norm_18'] = norm(ops['conv_transpose_17'], [64])
    ops['relu_19'] = tf.nn.relu(ops['norm_18'])

    ops['conv_transpose_20'] = conv_transpose(
        ops['relu_19'], [1, 2, 2, 1], [3, 3, 32, 64], ops['conv_3'])
    ops['norm_21'] = norm(ops['conv_transpose_20'], [32])
    ops['relu_22'] = tf.nn.relu(ops['norm_21'])

    ops['pad_23'] = pad(ops['relu_22'], 1);
    ops['conv_24'] = conv(ops['pad_23'], [1, 1, 1, 1], [3, 3, 32, 3])

    ops['squeeze'] = tf.squeeze(ops['conv_24'])
    vgg_mean_0 = tf.constant(103.939)
    vgg_mean_1 = tf.constant(116.779)
    vgg_mean_2 = tf.constant(123.68)
    red, green, blue = tf.split(ops['squeeze'], num_or_size_splits=3, axis=2)
    ops['bgr'] = tf.concat([blue + vgg_mean_2, green + vgg_mean_1, red + vgg_mean_0], 2)

    # TensorBoard output
    tf.summary.FileWriter("./tb/", tf.get_default_graph()).close()

    # Run session
    sess = tf.Session()
    saver = tf.train.Saver()
    saver.restore(sess, 'model/texture_net.chkp')
    output = sess.run(ops['bgr'], feed_dict={image: input_image})
    sess.close()

    return output

        卷积层定义如下,和之前的相同:

def conv(input, strides, shape_filter):
    filter = tf.Variable(tf.truncated_normal(shape_filter, stddev=0.1), name='filter')
    return tf.nn.conv2d(input, filter, strides, padding='VALID', use_cudnn_on_gpu=None)

        用的是普通的Batch Normalization,代码如下:

def norm(input, shape_parameter):
    scale = tf.Variable(tf.truncated_normal(shape_parameter, stddev=0.1), name='scale')
    offset = tf.Variable(tf.truncated_normal(shape_parameter, stddev=0.1), name='offset')
    epsilon = 1e-5
    mean, var = tf.nn.moments(input, [1, 2], keep_dims=True)
    return tf.nn.batch_normalization(input, mean, var, offset, scale, epsilon)

        同样是简单的反卷积

def conv_transpose(input, strides, shape_filter, corresponding_tensor):
    filter = tf.Variable(tf.truncated_normal(shape_filter, stddev=0.1), name='filter')
    shape = tf.shape(corresponding_tensor)
    outputshape = tf.stack([shape[0], shape[1], shape[2], shape[3]])
    return tf.nn.conv2d_transpose(input, filter, outputshape, strides, padding='VALID')

3. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images

        第三篇论文的地址为 [Paper] (ECCV 2016),和前两篇论文不同的是,它使用生成对抗网络来进行风格迁移,但效果好像不太好。在Neural-Style-Transfer-Papers仓库中,仅收录了这篇论文的Torch实现(说是torch,还以为是Python写的呢,竟然是lua)。

        下面是该网络的生成器部分:

netG:add(nn.SpatialFullConvolution(opt.netEnco_vgg_nOutputPlane, opt.nf * 8, 3, 3, 1, 1, 1, 1)) -- x 1
netG:add(nn.SpatialBatchNormalization(opt.nf * 8)):add(nn.ReLU(true))
netG:add(nn.SpatialFullConvolution(opt.nf * 8, opt.nf * 4, 4, 4, 2, 2, 1, 1)) -- x 2
netG:add(nn.SpatialBatchNormalization(opt.nf * 4)):add(nn.ReLU(true))
netG:add(nn.SpatialFullConvolution(opt.nf * 4, opt.nf * 2, 4, 4, 2, 2, 1, 1)) -- x 4
netG:add(nn.SpatialBatchNormalization(opt.nf * 2)):add(nn.ReLU(true))
netG:add(nn.SpatialFullConvolution(opt.nf * 2, opt.nc, 4, 4, 2, 2, 1, 1)) -- x 8
netG:add(nn.Tanh())
netG:apply(weights_init)

        下面好像是判别器部分?这个是真的看不明白。

table.insert(netS, nn.Sequential())
netS[i_netS]:add(nn.LeakyReLU(0.2, true))
netS[i_netS]:add(nn.SpatialConvolution(opt.netS_vgg_nOutputPlane[i_netS], opt.nf * 4, 4, 4, 2, 2, 1, 1)) -- x 1/2
netS[i_netS]:add(nn.SpatialBatchNormalization(opt.nf * 4)):add(nn.LeakyReLU(0.2, true))
netS[i_netS]:add(nn.SpatialConvolution(opt.nf * 4, opt.nf * 8, 4, 4, 2, 2, 1, 1)) -- x 1/4
netS[i_netS]:add(nn.SpatialBatchNormalization(opt.nf * 8)):add(nn.LeakyReLU(0.2, true))   
netS[i_netS]:add(nn.SpatialConvolution(opt.nf * 8, 1, 1, 1)) -- classify each neural patch using convolutional operation
netS[i_netS]:add(nn.Reshape(opt.batchSize * opt.netS_blocksize[i_netS] * opt.netS_blocksize[i_netS], 1, 1, 1, false)) -- reshape the classification result for computing loss
netS[i_netS]:add(nn.View(1):setNumInputDims(3))
netS[i_netS]:apply(weights_init)

4. Summary

        单模型单风格的论文最新的也是2016年了(根据这个仓库的收录),其中,前两篇论文的实现大致相同,和我目前在用的代码也基本相同。若是从效果上来看,Prisma之类的应用比这些开源代码的效果还是好上一些,应该还是存在改进空间的,只是这些厂商没有公开。

相关文章

网友评论

      本文标题:风格迁移模型架构优化

      本文链接:https://www.haomeiwen.com/subject/gkwjjqtx.html