在移动端部署时,比较成熟的方案是使用单模型单风格(Per-Style-Per-model,PSPM)的模型,因为这类模型只有在训练时依赖VGG16,在部署时可以丢弃VGG16,只需部署单独的生成网络即可。
在Neural-Style-Transfer-Papers这个仓库中统计了许多图像风格迁移领域的论文,并按照单模型单风格、单模型多风格、单模型任意风格进行了分类,而在其中,单模型单风格的论文大致有4片,并且提供了实现的代码,接下来的工作是阅读所有的代码,对比其中的生成网络的架构,寻找优化的方向与可能性。
1. Perceptual Losses for Real-Time Style Transfer and Super-Resolution
第一篇论文来自斯坦福大学的Justin Johnson、Alexandre Alahi以及Li Fei-Fei,论文地址为[Paper] (ECCV 2016),在Neural-Style-Transfer-Papers仓库中,收录了这篇论文的三种实现,这里主要看其TensorFlow实现。
其图像生成网络结构如下,其中preds的写法不是很明白:
def net(image):
conv1 = _conv_layer(image, 32, 9, 1)
conv2 = _conv_layer(conv1, 64, 3, 2)
conv3 = _conv_layer(conv2, 128, 3, 2)
resid1 = _residual_block(conv3, 3)
resid2 = _residual_block(resid1, 3)
resid3 = _residual_block(resid2, 3)
resid4 = _residual_block(resid3, 3)
resid5 = _residual_block(resid4, 3)
conv_t1 = _conv_tranpose_layer(resid5, 64, 3, 2)
conv_t2 = _conv_tranpose_layer(conv_t1, 32, 3, 2)
conv_t3 = _conv_layer(conv_t2, 3, 9, 1, relu=False)
preds = tf.nn.tanh(conv_t3) * 150 + 255./2
return preds
这是卷积层,和其他网络中的实现差不多:
def _conv_layer(net, num_filters, filter_size, strides, relu=True):
weights_init = _conv_init_vars(net, num_filters, filter_size)
strides_shape = [1, strides, strides, 1]
net = tf.nn.conv2d(net, weights_init, strides_shape, padding='SAME')
net = _instance_norm(net)
if relu:
net = tf.nn.relu(net)
return net
这是反卷积层,没有使用图像差值,应该会有棋盘效应:
def _conv_tranpose_layer(net, num_filters, filter_size, strides):
weights_init = _conv_init_vars(net, num_filters, filter_size, transpose=True)
batch_size, rows, cols, in_channels = [i.value for i in net.get_shape()]
new_rows, new_cols = int(rows * strides), int(cols * strides)
new_shape = [batch_size, new_rows, new_cols, num_filters]
tf_shape = tf.stack(new_shape)
strides_shape = [1,strides,strides,1]
net = tf.nn.conv2d_transpose(net, weights_init, tf_shape, strides_shape, padding='SAME')
net = _instance_norm(net)
return tf.nn.relu(net)
这是残差块的实现,由两层卷积层组成:
def _residual_block(net, filter_size=3):
tmp = _conv_layer(net, 128, filter_size, 1)
return net + _conv_layer(tmp, 128, filter_size, 1, relu=False)
这是instance normolization的实现:
def _instance_norm(net, train=True):
batch, rows, cols, channels = [i.value for i in net.get_shape()]
var_shape = [channels]
mu, sigma_sq = tf.nn.moments(net, [1,2], keep_dims=True)
shift = tf.Variable(tf.zeros(var_shape))
scale = tf.Variable(tf.ones(var_shape))
epsilon = 1e-3
normalized = (net-mu)/(sigma_sq + epsilon)**(.5)
return scale * normalized + shift
2. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images
第二篇论文来自Prisma team,是对原始论文、祖师爷Gatys的A Neural Algorithm of Artistic Style的改进,地址为[Paper] (ICML 2016),在Neural-Style-Transfer-Papers仓库中,收录了这篇论文的Torch和TensorFlow实现,这里主要看其TensorFlow实现。
这篇论文和前面一篇差不多,都是将原来的求解全局最优解问题转换成用前向网络逼近最优解,Gatys的方法每次要将一幅内容图进行风格转换,就要进行不断的迭代,而这里列举的两篇论文都是先训练得到前向生成网络,以后再来一张内容图,直接输入到生成网络中,即可得到具有预先训练的风格的内容图。
其图像生成网络的架构如下,和第一篇论文的实现代码一样:
def network(input_image):
ops = {}
image = tf.placeholder(tf.float32, shape=None, name='image-placeholder')
ops['preprocessing'] = tf.div(image, 255)
ops['preprocessing'] = tf.expand_dims( ops['preprocessing'], 0)
ops['pad_2'] = pad(ops['preprocessing'], 4)
ops['conv_3'] = conv(ops['pad_2'], [1, 1, 1, 1], [9, 9, 3, 32])
ops['norm_4'] = norm(ops['conv_3'], [32])
ops['relu_5'] = tf.nn.relu( ops['norm_4'])
ops['conv_6'] = conv(ops['relu_5'], [1, 2, 2, 1], [3, 3, 32, 64])
ops['norm_7'] = norm(ops['conv_6'], [64])
ops['relu_8'] = tf.nn.relu(ops['norm_7'])
ops['conv_9'] = conv(ops['relu_8'], [1, 2, 2, 1], [3, 3, 64, 128])
ops['norm_10'] = norm(ops['conv_9'], [128])
ops['relu_11'] = tf.nn.relu(ops['norm_10'])
ops['res_block_11'] = ops['relu_11']
for i in range(12, 17):
ops['res_block_' + str(i)] = res_block(ops['res_block_' + str(i-1)])
ops['conv_transpose_17'] = conv_transpose(
ops['res_block_16'], [1, 2, 2, 1], [3, 3, 64, 128], ops['conv_6'])
ops['norm_18'] = norm(ops['conv_transpose_17'], [64])
ops['relu_19'] = tf.nn.relu(ops['norm_18'])
ops['conv_transpose_20'] = conv_transpose(
ops['relu_19'], [1, 2, 2, 1], [3, 3, 32, 64], ops['conv_3'])
ops['norm_21'] = norm(ops['conv_transpose_20'], [32])
ops['relu_22'] = tf.nn.relu(ops['norm_21'])
ops['pad_23'] = pad(ops['relu_22'], 1);
ops['conv_24'] = conv(ops['pad_23'], [1, 1, 1, 1], [3, 3, 32, 3])
ops['squeeze'] = tf.squeeze(ops['conv_24'])
vgg_mean_0 = tf.constant(103.939)
vgg_mean_1 = tf.constant(116.779)
vgg_mean_2 = tf.constant(123.68)
red, green, blue = tf.split(ops['squeeze'], num_or_size_splits=3, axis=2)
ops['bgr'] = tf.concat([blue + vgg_mean_2, green + vgg_mean_1, red + vgg_mean_0], 2)
# TensorBoard output
tf.summary.FileWriter("./tb/", tf.get_default_graph()).close()
# Run session
sess = tf.Session()
saver = tf.train.Saver()
saver.restore(sess, 'model/texture_net.chkp')
output = sess.run(ops['bgr'], feed_dict={image: input_image})
sess.close()
return output
卷积层定义如下,和之前的相同:
def conv(input, strides, shape_filter):
filter = tf.Variable(tf.truncated_normal(shape_filter, stddev=0.1), name='filter')
return tf.nn.conv2d(input, filter, strides, padding='VALID', use_cudnn_on_gpu=None)
用的是普通的Batch Normalization,代码如下:
def norm(input, shape_parameter):
scale = tf.Variable(tf.truncated_normal(shape_parameter, stddev=0.1), name='scale')
offset = tf.Variable(tf.truncated_normal(shape_parameter, stddev=0.1), name='offset')
epsilon = 1e-5
mean, var = tf.nn.moments(input, [1, 2], keep_dims=True)
return tf.nn.batch_normalization(input, mean, var, offset, scale, epsilon)
同样是简单的反卷积:
def conv_transpose(input, strides, shape_filter, corresponding_tensor):
filter = tf.Variable(tf.truncated_normal(shape_filter, stddev=0.1), name='filter')
shape = tf.shape(corresponding_tensor)
outputshape = tf.stack([shape[0], shape[1], shape[2], shape[3]])
return tf.nn.conv2d_transpose(input, filter, outputshape, strides, padding='VALID')
3. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images
第三篇论文的地址为 [Paper] (ECCV 2016),和前两篇论文不同的是,它使用生成对抗网络来进行风格迁移,但效果好像不太好。在Neural-Style-Transfer-Papers仓库中,仅收录了这篇论文的Torch实现(说是torch,还以为是Python写的呢,竟然是lua)。
下面是该网络的生成器部分:
netG:add(nn.SpatialFullConvolution(opt.netEnco_vgg_nOutputPlane, opt.nf * 8, 3, 3, 1, 1, 1, 1)) -- x 1
netG:add(nn.SpatialBatchNormalization(opt.nf * 8)):add(nn.ReLU(true))
netG:add(nn.SpatialFullConvolution(opt.nf * 8, opt.nf * 4, 4, 4, 2, 2, 1, 1)) -- x 2
netG:add(nn.SpatialBatchNormalization(opt.nf * 4)):add(nn.ReLU(true))
netG:add(nn.SpatialFullConvolution(opt.nf * 4, opt.nf * 2, 4, 4, 2, 2, 1, 1)) -- x 4
netG:add(nn.SpatialBatchNormalization(opt.nf * 2)):add(nn.ReLU(true))
netG:add(nn.SpatialFullConvolution(opt.nf * 2, opt.nc, 4, 4, 2, 2, 1, 1)) -- x 8
netG:add(nn.Tanh())
netG:apply(weights_init)
下面好像是判别器部分?这个是真的看不明白。
table.insert(netS, nn.Sequential())
netS[i_netS]:add(nn.LeakyReLU(0.2, true))
netS[i_netS]:add(nn.SpatialConvolution(opt.netS_vgg_nOutputPlane[i_netS], opt.nf * 4, 4, 4, 2, 2, 1, 1)) -- x 1/2
netS[i_netS]:add(nn.SpatialBatchNormalization(opt.nf * 4)):add(nn.LeakyReLU(0.2, true))
netS[i_netS]:add(nn.SpatialConvolution(opt.nf * 4, opt.nf * 8, 4, 4, 2, 2, 1, 1)) -- x 1/4
netS[i_netS]:add(nn.SpatialBatchNormalization(opt.nf * 8)):add(nn.LeakyReLU(0.2, true))
netS[i_netS]:add(nn.SpatialConvolution(opt.nf * 8, 1, 1, 1)) -- classify each neural patch using convolutional operation
netS[i_netS]:add(nn.Reshape(opt.batchSize * opt.netS_blocksize[i_netS] * opt.netS_blocksize[i_netS], 1, 1, 1, false)) -- reshape the classification result for computing loss
netS[i_netS]:add(nn.View(1):setNumInputDims(3))
netS[i_netS]:apply(weights_init)
4. Summary
单模型单风格的论文最新的也是2016年了(根据这个仓库的收录),其中,前两篇论文的实现大致相同,和我目前在用的代码也基本相同。若是从效果上来看,Prisma之类的应用比这些开源代码的效果还是好上一些,应该还是存在改进空间的,只是这些厂商没有公开。
网友评论