2018-09-03关于CycleGAN优化发现

Augmented CycleGAN: Learning Many-to-Many Mappings
from Unpaired Data

多对多映射的CycleGAN。
很有道理。
很多场景可以具有多种特征形态，单纯的映射为单一状态，并不完全合理。
也许我们可以利用这个特征，来训练，从恢复的多个状态中选择一个最合适的作为结果来处理。

CycleGAN代码优化实现的版本

2018-09-05代码学习笔记

命令行学习神器
 关于映射填充
 关于在tensorflow中读取数据

2018-09-11 关于urllib3的使用

曾经的urllib2在python36环境下已经下载不到了
只能有urllib3
对于网页数据集下载，从：

f=urllib2.urlopen(url)
return json.loads(f.read())

变成

def list_categories(tag):
    url = 'http://lsun.cs.princeton.edu/htbin/list.cgi?tag=' + tag
    pool = urllib3.PoolManager()
    f = pool.urlopen('GET', url)
    return json.loads(f.data.decode('utf-8'))

2018-09-14关于卷积中的细节

conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=None,
    data_format=None,
    name=None
)

input是一个4d输入[batch_size, in_height, in_width, n_channels]，表示图片的批数，大小和通道。
filter是一个4d输入[filter_height, filter_width, in_channels, out_channels]，表示kernel的大小，输入通道数和输出通道数，其中输出通道数表示从上一层提取多少特征。
参考
 讲的超级无敌清楚的卷积细节
image.png

卷积层尺寸的计算原理

输入矩阵格式：四个维度，依次为：样本数、图像高度、图像宽度、图像通道数
输出矩阵格式：与输出矩阵的维度顺序和含义相同，但是后三个维度（图像高度、图像宽度、图像通道数）的尺寸发生变化。
权重矩阵（卷积核）格式：同样是四个维度，但维度的含义与上面两者都不同，为：卷积核高度、卷积核宽度、输入通道数、输出通道数（卷积核个数）
输入矩阵、权重矩阵、输出矩阵这三者之间的相互决定关系
- 卷积核的输入通道数（in depth）由输入矩阵的通道数所决定。（红色标注）
- 输出矩阵的通道数（out depth）由卷积核的输出通道数所决定。（绿色标注）
- 输出矩阵的高度和宽度（height, width）这两个维度的尺寸由输入矩阵、卷积核、扫描方式所共同决定。计算公式如下。（蓝色标注）

image.png

tesorflow中的激活函数

所有激活函数输入和输出的维度是一样的

tf.nn.relu()
tf.nn.sigmoid()
tf.nn.tanh()
tf.nn.elu()
tf.nn.bias_add()
tf.nn.crelu()
tf.nn.relu6()
tf.nn.softplus()
tf.nn.softsign()
tf.nn.dropout()
tf.nn.relu_layer(x, weights, biases,name=None)
def relu_layer(x, weights, biases, name=None):
  """Computes Relu(x * weight + biases).
  Args:
    x: a 2D tensor.  Dimensions typically: batch, in_units
    weights: a 2D tensor.  Dimensions typically: in_units, out_units
    biases: a 1D tensor.  Dimensions: out_units
    name: A name for the operation (optional).  If not specified
      "nn_relu_layer" is used.
  Returns:
    A 2-D Tensor computing relu(matmul(x, weights) + biases).
    Dimensions typically: batch, out_units.
  """

2018-09-15函数笔记

激活函数

image.png

红色：ReLU
蓝色：Tanh
绿色：Sigmoid
紫色：Linear

tf.reshape(tensor,shape, name=None)

函数的作用是将tensor变换为参数shape的形式。.
其中shape为一个列表形式，特殊的一点是列表中可以存在-1。-1代表的含义是不用我们自己指定这一维的大小，函数会自动计算，但列表中只能存在一个-1。

tf.gradients

tf.gradients(ys, xs, 
             grad_ys=None, 
             name='gradients',
             colocate_gradients_with_ops=False,
             gate_gradients=False,
             aggregation_method=None,
             stop_gradients=None)

对求导函数而言，其主要功能即求导公式：∂y/∂x。在tensorflow中，y和x都是tensor。
更进一步，tf.gradients()接受求导值ys和xs不仅可以是tensor，还可以是list，形如[tensor1, tensor2, …, tensorn]。当ys和xs都是list时，它们的求导关系为：
1.tf.gradients()实现ys对xs求导
2.求导返回值是一个list，list的长度等于len(xs)

3.假设返回值是[grad1, grad2, grad3]，ys=[y1, y2]，xs=[x1, x2, x3]。则，真实的计算过程为: .

image.png

2018-09-15代码研究-improved_wgan

在colorGAN中关于损失计算部分，improved_wgan主要是增加一个梯度惩罚项，gradients_penalty_lamda为系数，默认是10。

        #discriminator_wgan # on sigmoid = no prob
        self.logits_real = self.discriminator_wgan(self.images, config=config)
        self.logits_fake = self.discriminator_wgan(self.generate_image, reuse=True, config=config)

        #w-distance
        self.g_loss = -tf.reduce_mean(self.logits_fake)
        self.d_loss = -tf.reduce_mean(self.logits_real - self.logits_fake)

        #improved wgan
        if config.improved_wgan:
            alpha = tf.random_uniform(
                shape=[config.batch_size, 1],
                minval=0.,
                maxval=1.,
                dtype=tf.float32
                )
            differences = self.generate_image - self.images
            interpolates = self.images+(alpha*differences)
            gradients = tf.gradients(self.discriminator_wgan(interpolates, reuse=True, config=config), [interpolates])[0]
            slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients), reduction_indices=[1]))
            self.gradient_penalty = tf.reduce_mean((slopes-1.)**2)
            self.d_loss+=config.gradient_penalty_lambda*self.gradient_penalty

        self.total_loss = self.d_loss + self.g_loss

2018-09-15论文与实现对比

https://arxiv.org/pdf/1702.06674.pdf

Generator

noise Z应该是只连接前半部分网络，代码中并未有额外的处理。
channels: 128->64->64->64->32->2（UV图）

Discriminator
代码实现情况一致

2018-09-17 今日代码细节

关于在 RGB2YUV 转换时的报错：
ValueError: dot (64,64) (3,3) dim 64 != dim 3
经过imge.shape查验是数据集中存在灰度图，大部分图像是（64,64,3）,小部分混入灰度图导致矩阵乘法失败，单通道本身无法进行多通道的转换。

踩坑：

image.ndim 无法正确检测通道
image.shape[2] 在shape[2]缺失的情况下，无法正确表示出真正的通道数，出错原因是缺维，而不是第三维的值不是3.
len(image.shape)==2 是表明是第三维缺失的情况，符合报错情况。

解决：
判断异常非三通道图片并从数据集中移除

import cv2
import os
import shutil
from glob import glob
from scipy import misc
import matplotlib.pyplot as plt

#No use
# def isGray(img):
#     if img.ndim == 3:
#         return False
#     else:
#         return True


if __name__ == '__main__':
    data = glob(os.path.join("./data/", "colorImage/", "*.JPEG"))
    print(len(data))
    image_file = data[:len(data)]
    counter = 0
    for one_image in image_file:
        img = misc.imread(one_image)
        if len(img.shape) < 3:
            print(img.shape)
            counter += 1
            os.remove(one_image)
            print("delete %d",counter)
    print(counter)
    print(len(data))

细节：关于使用python删除文件：
1. os.remove() removes a file.
2. os.rmdir() removes an empty directory.
3. shutil.rmtree() deletes a directory and all its contents.
4. pathlib.Path.unlink() removes the file or symbolic link.
5. pathlib.Path.rmdir() removes the empty directory.