神经风格迁移

作者: 此间不留白 | 来源:发表于2020-01-31 13:47 被阅读0次

神经风格迁移
神经风格迁移
tf2.X-神经风格迁移，人人都是艺术家
风格迁移
风格迁移
风格迁移
风格迁移
风格迁移
多模式神经网络风格迁移工具
深度学习笔记之人脸识别和风格抓换

前言

图像的风格迁移是计算机视觉领域最有趣的应用之一，用深度学习实现图像的风格迁移，可以分为实现神经风格迁移算法和利用算法生成新的艺术图像。大多数深度学习算法需要优化损失函数来得到一系列参数，而神经风格迁移是通过优化损失函数得到像素值。在整个算法实现之前，首先，需要导入相关库


import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
from nst_utils import *
import numpy as np
import tensorflow as tf

神经风格迁移的实现是利用两幅图像，包括一张内容图像（用C表示）和风格图像（用S表示）生成一张新的图像（用G表示），这张新生成的图像包含了图像C的内容和图像S的风格。以下过程，如下图所示：

迁移学习

神经风格迁移的实现使用了预训练的卷积模型，根据NST（Nerual Style Transfer）的论文，实现神经风格迁移算法，使用了VGG-19的深度神经网络，VGG-19的模型已经利用了大量的图像数据进行了训练，在神经网络的较浅层能够学习低层特征，在网络的深层能够学习到图像的深层特征。

加载预训练模型的代码如下所示：

model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")

整个模型被存储在python中的字典中，字典中的键表示变量名，与键相对应的值表示变量名所对应的张量值。

如下代码所示，可以指定模型的输入图像

model["input"].assign(image)

也可以访问网络的指定的某一激活层，例如在此图像上运行网络的4_2层，可以用以下代码实现：

sess.run(model["conv4_2"])

神经风格迁移

构建神经风格迁移的过程可以分为以下几个步骤：

定义内容损失函数 $J_{content}(C,G)$
定义风格损失函数 $J_{style}(S,G)$
根据内容损失函数和风格损失函数得到一个损失函数 $J = \alpha J_{content}(C,G) + \beta J_{sytle}(S,G)$

计算内容损失

内容损失函数之前，首先，需要确定内容图片，用如下代码加载内容图片：


import imageio
content_image = imageio.imread("images/louvre.jpg")
imshow(content_image)

利用卷积神经网络，利用浅层的网络提取图像边缘等简单的纹理特征，而利用深层的网络能够提取复杂的诸如物体类别等较为复杂的纹理特征。

我们希望生成的图像 $G$ 具有内容图像 $C$ 的内容，需要选择网络的一些激活层代表图像的内容。实际上，通常会选择网络的中间层，既不是浅层，也不是深层。

假定选择网络中的隐藏层L，设定图像C为预训练的模型VGG的输入，利用前向传播，得到输入图像C在隐藏层L的激活值 $a^{(C)}$ ,将图像G作为输入，在隐藏层L同样能够得到一个图像G的激活值，用 $a^{(G)}$ 表示，其中隐藏层L的输出是一个 $n_H×n_W×n_C$ 的张量，则内容的损失函数，可以用以下公式表示：
$J_{content}(C,G) = \frac{1}{4×n_H×n_W×n_C} \sum (a^{(C)}-a^{(G)})^2$

隐藏层L的激活值是一个3维向量，为了方便计算，可以将其转化为一个二维向量，如下图所示：

内容损失函数的定义如下代码所示：


def compute_content_cost(a_C, a_G):
    """
    计算内容损失
    
    参数:
    a_C -- (1, n_H, n_W, n_C)的张量, 图像C隐藏层的激活值
    a_G --(1, n_H, n_W, n_C), 的张量, 图像G隐藏层的激活值
    
    返回值: 
    J_content  损失值
    """

    m, n_H, n_W, n_C = a_G.get_shape.as_list()
    
   
    a_C_unrolled = tf.reshape(a_C, [n_H*n_W, n_C])
    a_G_unrolled = tf.reshape(a_C, [n_H*n_W, n_C])
    

   J_content = 1./(4 * n_H * n_W * n_C)*tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled, a_G_unrolled)))

    return J_content

计算风格损失

风格矩阵

风格矩阵又被称为Grma矩阵,在线性代数中，Grma矩阵表示一系列向量( $v_1,v_2,…v_n$ )点积的集合，例如， $G_{ij}$ 表示的是 $v_i^{T}v_{j}$ .即也就是， $G_{ij}$ 能够比较的是两个向量 $v_i,v_j$ 的相似程度，如果这两个向量高度相似，则说明这两个向量的点积值较大，也就是 $G_{ij}$ 较大。

综上所述，两幅图像，经过转换之后，将一张图片经过转置之后，可以直接相乘得到Grma矩阵，计算神经网络风格迁移的过程可以如下图表示：

需要注意的是，gram矩阵的对角元素 $G_{ii}$ 衡量滤波器的活跃程度，假定第 $i$ 个滤波器正在探测图像的垂直纹理， $G_{ii}$ 能够衡量整个图像中垂直纹理的普遍程度， $G_{ii}$ 越大，图像中的垂直纹理越多。

通过捕获不同特征的普遍程度( $G_{ii}$ )以及同时出现的不同特征的数量( $G_{ij}$ ),风格矩阵 $G$ 可以衡量图像的风格。

利用tensorflow实现图像的风格矩阵的代码如下所示：



def gram_matrix(A):
    """
   参数：
    A --矩阵 (n_C, n_H*n_W)
    
    返回值:
    GA --矩阵A的 Grma矩阵(n_C, n_C)
    """
  
    GA = tf.lingla.matmul(A,A,transpose_b=True)

    return GA

风格损失

生成风格矩阵之后，优化目标就会变成最小化生成图像与风格矩阵之间的距离，可以用神经网络的一个隐藏层实现，具体实现公式如下所示：
$J_{style}^{[l]}(S,G) = \frac{1}{4 \times {n_C}^2 \times (n_H \times n_W)^2} \sum _{i=1}^{n_C}\sum_{j=1}^{n_C}(G^{(S)}_{ij} - G^{(G)}_{ij})^2$

$G^{(G)}$ 和 $G^{(S)}$ 图像分别表示生成图像和风格图像的Gram矩阵。利用代码实现此公式可以分为以下几个步骤：

从隐藏层激活值 $a^{[G]}$ 恢复矩阵的维数;
将隐藏层激活值 $a^{[G]}$ 和 $a^{[S]}$ 转换为2维矩阵;
计算图像的风格矩阵；
计算风格损失。

综上，整体实现代码如下所示：

def compute_layer_style_cost(a_S, a_G):
    """
    参数:
    a_S --  (1, n_H, n_W, n_C)维数的张量, 表示风格图像隐藏层的激活值
    a_G --  (1, n_H, n_W, n_C)维数的张量, 图像G的风格所代表的隐藏层的激活值
    
    返回值: 
    J_style_layer --风格损失
    """
   
    m, n_H, n_W, n_C = a_G.get_shape().as_list()

    a_S = tf.reshape(a_S,[n_H*n_W,n_C])
    a_G = tf.reshape(a_G,[n_H*n_W,n_C])

    GS = gram_matrix(a_S)
    GG = gram_matrix(a_G)
    J_style_layer = 1./(4*n_C**2*(n_H*n_W)**2)*tf.sum(tf.reduce_sum(tf.square(tf.substract(GS,GG)))

    return J_style_layer

风格权重

至此，已经完成了单个网络层中关于图像风格损失的实现，如果，利用多层网络结构，并给每一层网络赋予不同的权重，最后，将这些网络层合并起来，可能会有意想不到的结果，具体代码如下所示：


STYLE_LAYERS = [
    ('conv1_1', 0.2),
    ('conv2_1', 0.2),
    ('conv3_1', 0.2),
    ('conv4_1', 0.2),
    ('conv5_1', 0.2)]

合并不同层的风格损失，可以如下公式所示：
$J_{style}(S,G) = \sum_{l} \lambda^{[l]} J^{[l]}_{style}(S,G)$

其中， $\lambda^{[l]}$ 表示不同层的权重，如上代码中的STYLE_LAYERS所示。

以上公式的代码实现，如下所示：


def compute_style_cost(model, STYLE_LAYERS):
    """
 计算不同层的总体风格损失
    参数:
    model --实现的tensorflow模型
    STYLE_LAYERS --一个python列表，包含网络层的名字以及对应的权重
    
    返回值: 
    J_style--返回的总体损失
    """
    
    # 初始化风格损失
    J_style = 0

    for layer_name, coeff in STYLE_LAYERS:
        out = model[layer_name]

        a_S = sess.run(out)
        a_G = out
       
        J_style_layer = compute_layer_style_cost(a_S, a_G)

        J_style += coeff * J_style_layer

    return J_style

定义总体损失函数

图像风格迁移的总体损失函数如下公式所示，由图像内容损失和图像风格损失总体构成，如下公式所示：

$J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)$

根据以上公式，代码实现如下所示：



def total_cost(J_content, J_style, alpha = 10, beta = 40):
    """
 计算总体损失
    参数:
    J_content -- 内容损失
    J_style -- 风格损失
    alpha -- 超参数，内容损失的权重值
    beta --  超参数，风格损失的权重值
   返回值：
    J --总体损失
    """
    J = alpha*J_content + beta*J_style
  
    return J

解决优化问题

综上，实现神经风格迁移的项目可以分为以下几个步骤：

创建一个交互seesion
载入内容图像
载入风格图像
随机初始化生成的图像
加载VGG-16模型
创建tensorflow图
- 通过VGG-16计算内容图像的损失
- 通过VGG-16计算风格图像的损失
- 计算总体损失
- 定义优化函数并初始化学习率
运行tensorflow图，并经过多次迭代更新生成的图像
与常规session不同，交互式session将自身设定为默认session，可以使得运行变量时，无需经常引用session对象，从而大大简化了代码。

如下代码所示：

tf.reset_default_graph()
sess = tf.InteractiveSession()

加载内容图像和风格图像，并实现图像转化

# 加载内容图像
content_image = iamgeio.imread("images/louvre_small.jpg")
content_image = reshape_and_normalize_image(content_image)
# 加载风格图像
style_image = imageio.imread("images/monet.jpg")
style_image = reshape_and_normalize_image(style_image)

生成随机噪声图像，初始化生成图像，如下所示：

generated_image = generate_noise_image(content_image)
imshow(generated_image[0])

加载VGG16模型

model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")

计算图像的内容损失时，将a_C和a_G分配给VGG模型特定的隐藏层激活，并且使用conv4_2层计算内容损失，如下代码所示：


# 将内容图像指定为模型的输入
sess.run(model['input'].assign(content_image))

# 选择conv4_2层的输出
out = model['conv4_2']

#设定a_C为选定的隐藏层激活值
a_C = sess.run(out)
# 设定a_G为相同的隐藏层激活值
a_G = out
# 计算内容损失
J_content = compute_content_cost(a_C, a_G)

风格图像的损失计算如下代码所示：

sess.run(model['input'].assign(style_image))
J_style = compute_style_cost(model, STYLE_LAYERS)

计算总体损失

J = total_cost(J_content, J_style, alpha = 10, beta = 40)

定义优化函数，并设定其学习率为2.0

optimizer = tf.train.AdamOptimizer(2.0)
train_step = optimizer.minimize(J)

通过模型的多次迭代实现风格迁移的代码如下所示：


def model_nn(sess, input_image, num_iterations = 200):
    
    # 初始化session的全局变量
 
    sess.run(tf.global_variables_initializer())
 
    # 初始化生成图像并将其指定为模型的输入
  
     sess.run(model['input'].assign(input_image))

    for i in range(num_iterations):
    
        # 通过迭代，最小化损失函数
    
        sess.run(train_step)
       
        # 通过给定当前模型的输入计算生成图像
   
        generated_image = sess.run(model['input'])
  
        if i%20 == 0:
            Jt, Jc, Js = sess.run([J, J_content, J_style])
            print("Iteration " + str(i) + " :")
            print("total cost = " + str(Jt))
            print("content cost = " + str(Jc))
            print("style cost = " + str(Js))
            
           保存生成的图像
            save_image("output/" + str(i) + ".png", generated_image)
    
   
    save_image('output/generated_image.jpg', generated_image)
    
    return generated_image

最后，运行此模型，并给定输入，生成的新图像，如下所示：

附录：相关代码


### Part of this code is due to the MatConvNet team and is used to load the parameters of the pretrained VGG19 model in the notebook ###

import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
from nst_utils import *

import numpy as np
import tensorflow as tf

class CONFIG:
    IMAGE_WIDTH = 400
    IMAGE_HEIGHT = 300
    COLOR_CHANNELS = 3
    NOISE_RATIO = 0.6
    MEANS = np.array([123.68, 116.779, 103.939]).reshape((1,1,1,3)) 
    VGG_MODEL = 'pretrained-model/imagenet-vgg-verydeep-19.mat' # Pick the VGG 19-layer model by from the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition".
    STYLE_IMAGE = 'images/stone_style.jpg' # Style image to use.
    CONTENT_IMAGE = 'images/content300.jpg' # Content image to use.
    OUTPUT_DIR = 'output/'
    
def load_vgg_model(path):
    """
    Returns a model for the purpose of 'painting' the picture.
    Takes only the convolution layer weights and wrap using the TensorFlow
    Conv2d, Relu and AveragePooling layer. VGG actually uses maxpool but
    the paper indicates that using AveragePooling yields better results.
    The last few fully connected layers are not used.
    Here is the detailed configuration of the VGG model:
        0 is conv1_1 (3, 3, 3, 64)
        1 is relu
        2 is conv1_2 (3, 3, 64, 64)
        3 is relu    
        4 is maxpool
        5 is conv2_1 (3, 3, 64, 128)
        6 is relu
        7 is conv2_2 (3, 3, 128, 128)
        8 is relu
        9 is maxpool
        10 is conv3_1 (3, 3, 128, 256)
        11 is relu
        12 is conv3_2 (3, 3, 256, 256)
        13 is relu
        14 is conv3_3 (3, 3, 256, 256)
        15 is relu
        16 is conv3_4 (3, 3, 256, 256)
        17 is relu
        18 is maxpool
        19 is conv4_1 (3, 3, 256, 512)
        20 is relu
        21 is conv4_2 (3, 3, 512, 512)
        22 is relu
        23 is conv4_3 (3, 3, 512, 512)
        24 is relu
        25 is conv4_4 (3, 3, 512, 512)
        26 is relu
        27 is maxpool
        28 is conv5_1 (3, 3, 512, 512)
        29 is relu
        30 is conv5_2 (3, 3, 512, 512)
        31 is relu
        32 is conv5_3 (3, 3, 512, 512)
        33 is relu
        34 is conv5_4 (3, 3, 512, 512)
        35 is relu
        36 is maxpool
        37 is fullyconnected (7, 7, 512, 4096)
        38 is relu
        39 is fullyconnected (1, 1, 4096, 4096)
        40 is relu
        41 is fullyconnected (1, 1, 4096, 1000)
        42 is softmax
    """
    
    vgg = scipy.io.loadmat(path)

    vgg_layers = vgg['layers']
    
    def _weights(layer, expected_layer_name):
        """
        Return the weights and bias from the VGG model for a given layer.
        """
        wb = vgg_layers[0][layer][0][0][2]
        W = wb[0][0]
        b = wb[0][1]
        layer_name = vgg_layers[0][layer][0][0][0][0]
        assert layer_name == expected_layer_name
        return W, b

        return W, b

    def _relu(conv2d_layer):
        """
        Return the RELU function wrapped over a TensorFlow layer. Expects a
        Conv2d layer input.
        """
        return tf.nn.relu(conv2d_layer)

    def _conv2d(prev_layer, layer, layer_name):
        """
        Return the Conv2D layer using the weights, biases from the VGG
        model at 'layer'.
        """
        W, b = _weights(layer, layer_name)
        W = tf.constant(W)
        b = tf.constant(np.reshape(b, (b.size)))
        return tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + b

    def _conv2d_relu(prev_layer, layer, layer_name):
        """
        Return the Conv2D + RELU layer using the weights, biases from the VGG
        model at 'layer'.
        """
        return _relu(_conv2d(prev_layer, layer, layer_name))

    def _avgpool(prev_layer):
        """
        Return the AveragePooling layer.
        """
        return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

    # Constructs the graph model.
    graph = {}
    graph['input']   = tf.Variable(np.zeros((1, CONFIG.IMAGE_HEIGHT, CONFIG.IMAGE_WIDTH, CONFIG.COLOR_CHANNELS)), dtype = 'float32')
    graph['conv1_1']  = _conv2d_relu(graph['input'], 0, 'conv1_1')
    graph['conv1_2']  = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')
    graph['avgpool1'] = _avgpool(graph['conv1_2'])
    graph['conv2_1']  = _conv2d_relu(graph['avgpool1'], 5, 'conv2_1')
    graph['conv2_2']  = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')
    graph['avgpool2'] = _avgpool(graph['conv2_2'])
    graph['conv3_1']  = _conv2d_relu(graph['avgpool2'], 10, 'conv3_1')
    graph['conv3_2']  = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')
    graph['conv3_3']  = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')
    graph['conv3_4']  = _conv2d_relu(graph['conv3_3'], 16, 'conv3_4')
    graph['avgpool3'] = _avgpool(graph['conv3_4'])
    graph['conv4_1']  = _conv2d_relu(graph['avgpool3'], 19, 'conv4_1')
    graph['conv4_2']  = _conv2d_relu(graph['conv4_1'], 21, 'conv4_2')
    graph['conv4_3']  = _conv2d_relu(graph['conv4_2'], 23, 'conv4_3')
    graph['conv4_4']  = _conv2d_relu(graph['conv4_3'], 25, 'conv4_4')
    graph['avgpool4'] = _avgpool(graph['conv4_4'])
    graph['conv5_1']  = _conv2d_relu(graph['avgpool4'], 28, 'conv5_1')
    graph['conv5_2']  = _conv2d_relu(graph['conv5_1'], 30, 'conv5_2')
    graph['conv5_3']  = _conv2d_relu(graph['conv5_2'], 32, 'conv5_3')
    graph['conv5_4']  = _conv2d_relu(graph['conv5_3'], 34, 'conv5_4')
    graph['avgpool5'] = _avgpool(graph['conv5_4'])
    
    return graph

def generate_noise_image(content_image, noise_ratio = CONFIG.NOISE_RATIO):
    """
    Generates a noisy image by adding random noise to the content_image
    """
    
    # Generate a random noise_image
    noise_image = np.random.uniform(-20, 20, (1, CONFIG.IMAGE_HEIGHT, CONFIG.IMAGE_WIDTH, CONFIG.COLOR_CHANNELS)).astype('float32')
    
    # Set the input_image to be a weighted average of the content_image and a noise_image
    input_image = noise_image * noise_ratio + content_image * (1 - noise_ratio)
    
    return input_image


def reshape_and_normalize_image(image):
    """
    Reshape and normalize the input image (content or style)
    """
    
    # Reshape image to mach expected input of VGG16
    image = np.reshape(image, ((1,) + image.shape))
    
    # Substract the mean to match the expected input of VGG16
    image = image - CONFIG.MEANS
    
    return image


def save_image(path, image):
    
    # Un-normalize the image so that it looks good
    image = image + CONFIG.MEANS
    
    # Clip and Save the image
    image = np.clip(image[0], 0, 255).astype('uint8')
    scipy.misc.imsave(path, image)