美文网首页
深度学习笔记(十三)—— GAN-2

深度学习笔记(十三)—— GAN-2

作者: Nino_Lau | 来源:发表于2019-08-02 09:17 被阅读0次

    LSGAN

    LSGAN(Least Squares GAN)将loss函数改为了 L2损失.G和D的优化目标如下图所示,


    image

    作业:

    在这里,请在下方补充L2Loss的代码来实现L2损失来优化上面的目标.并使用这个loss函数在mnist数据集上训练LSGAN,并显示训练的效果图片及loss变化曲线.

    提示:忽略上图的1/2.L2损失即MSEloss(均方误差),传入两个参数input_是指判别器D预测为"真实"的概率值(size为batch_size*1),target为标签1或0(size为batch_size*1).只允许使用pytorch和python的运算实现(不能直接调用MSEloss)

    class L2Loss(nn.Module):
        
        def __init__(self):
            super(L2Loss, self).__init__()
        
        def forward(self, input_, target):
            """
            input_: (batch_size*1) 
            target: (batch_size*1) labels, 1 or 0
            """
            return ((input_ - target) ** 2).mean()
    

    完成上方代码后,使用所写的L2Loss在mnist数据集上训练DCGAN.

    # hyper params
    
    # z dim
    latent_dim = 100
    
    # image size and channel
    image_size=32
    image_channel=1
    
    # Adam lr and betas
    learning_rate = 0.0002
    betas = (0.5, 0.999)
    
    # epochs and batch size
    n_epochs = 100
    batch_size = 32
    
    # device : cpu or cuda:0/1/2/3
    device = torch.device('cuda:0')
    
    # mnist dataset and dataloader
    train_dataset = load_mnist_data()
    trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    
    # use L2Loss as loss function
    l2loss = L2Loss().to(device)
    
    # G and D model, use DCGAN
    G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
    D = DCDiscriminator(image_size=image_size, input_channel=image_channel).to(device)
    
    # G and D optimizer, use Adam or SGD
    G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
    D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)
    
    d_loss_hist, g_loss_hist = run_gan(trainloader, G, D, G_optimizer, D_optimizer, l2loss, n_epochs, device, 
                                       latent_dim)
    loss_plot(d_loss_hist, g_loss_hist)
    
    Epoch 0: Train D loss: 0.0631, G loss: 0.9534
    
    image
    Epoch 1: Train D loss: 0.0268, G loss: 0.9953
    Epoch 2: Train D loss: 0.0002, G loss: 1.0000
    Epoch 3: Train D loss: 0.0001, G loss: 1.0000
    Epoch 4: Train D loss: 0.0000, G loss: 1.0000
    Epoch 5: Train D loss: 0.0000, G loss: 1.0000
    Epoch 6: Train D loss: 0.0000, G loss: 1.0000
    Epoch 7: Train D loss: 0.0000, G loss: 1.0000
    Epoch 8: Train D loss: 0.0000, G loss: 1.0000
    Epoch 9: Train D loss: 0.0000, G loss: 1.0000
    
    image
    Epoch 10: Train D loss: 0.0000, G loss: 1.0000
    Epoch 11: Train D loss: 0.0000, G loss: 1.0000
    Epoch 12: Train D loss: 0.0000, G loss: 1.0000
    Epoch 13: Train D loss: 0.0000, G loss: 1.0000
    Epoch 14: Train D loss: 0.0000, G loss: 0.9999
    Epoch 15: Train D loss: 0.0155, G loss: 0.9995
    Epoch 16: Train D loss: 0.0000, G loss: 0.9999
    Epoch 17: Train D loss: 0.0855, G loss: 0.9992
    Epoch 18: Train D loss: 1.0000, G loss: 1.0000
    Epoch 19: Train D loss: 1.0000, G loss: 1.0000
    
    image
    Epoch 20: Train D loss: 1.0000, G loss: 1.0000
    Epoch 21: Train D loss: 1.0000, G loss: 1.0000
    Epoch 22: Train D loss: 1.0000, G loss: 1.0000
    Epoch 23: Train D loss: 1.0000, G loss: 1.0000
    Epoch 24: Train D loss: 0.9999, G loss: 1.0000
    Epoch 25: Train D loss: 0.4592, G loss: 1.0000
    Epoch 26: Train D loss: 0.0000, G loss: 1.0000
    Epoch 27: Train D loss: 0.0000, G loss: 1.0000
    Epoch 28: Train D loss: 0.0000, G loss: 1.0000
    Epoch 29: Train D loss: 0.0000, G loss: 1.0000
    

    ......

    image
    Epoch 90: Train D loss: 0.0000, G loss: 1.0000
    Epoch 91: Train D loss: 0.0000, G loss: 1.0000
    Epoch 92: Train D loss: 0.0000, G loss: 1.0000
    Epoch 93: Train D loss: 0.0000, G loss: 1.0000
    Epoch 94: Train D loss: 0.0000, G loss: 1.0000
    Epoch 95: Train D loss: 0.0000, G loss: 1.0000
    Epoch 96: Train D loss: 0.0000, G loss: 1.0000
    Epoch 97: Train D loss: 0.0000, G loss: 1.0000
    Epoch 98: Train D loss: 0.0000, G loss: 1.0000
    Epoch 99: Train D loss: 0.0000, G loss: 1.0000
    
    image image

    WGAN

    GAN依然存在着训练不稳定,模式崩溃(collapse mode,可以理解为生成的图片多样性极低)的问题(我们的数据集不一定能体现出来).WGAN(Wasserstein GAN)将传统GAN中拟合的JS散度改为Wasserstein距离.WGAN一定程度上解决了GAN训练不稳定以及模式奔溃的问题.

    WGAN的判别器的优化目标变为,在满足Lipschitz连续的条件(我们可以限制w不超过某个范围来满足)下,最大化


    image

    而它会近似于真实分布与生成分布之间的Wasserstein距离.所以我们D和G的loss函数变为:


    image
    image

    具体到在实现上,WGAN主要有3点改变:

    • 判别器D最后一层去掉sigmoid
    • 生成器G和判别器的loss不使用log
    • 每次更新判别器D后,将参数的绝对值截断到某一个固定常数c

    所以我们主要重写了WGAN的训练函数,在这里,网络结构使用去除Sigmoid的DCGAN(注意初始化D时将sigmoid设置为False来去掉最后一层sigmoid).

    下面是WGAN的代码实现.加入了两个参数,n_d表示每训练一次G训练D的次数,weight_clip表示截断的常数.

    def wgan_train(trainloader, G, D, G_optimizer, D_optimizer, device, z_dim, n_d=2, weight_clip=0.01):
        
        """
        n_d: the number of iterations of D update per G update iteration
        weight_clip: the clipping parameters
        """
        
        D.train()
        G.train()
        
        D_total_loss = 0
        G_total_loss = 0
        
        for i, (x, _) in enumerate(trainloader):
            
            x = x.to(device)
            
            # update D network
            # D optimizer zero grads
            D_optimizer.zero_grad()
            
            # D real loss from real images
            d_real = D(x)
            d_real_loss = - d_real.mean()
            
            # D fake loss from fake images generated by G
            z = torch.rand(x.size(0), z_dim).to(device)
            g_z = G(z)
            d_fake = D(g_z)
            d_fake_loss = d_fake.mean()
            
            # D backward and step
            d_loss = d_real_loss + d_fake_loss
            d_loss.backward()
            D_optimizer.step()
            
            # D weight clip
            for params in D.parameters():
                params.data.clamp_(-weight_clip, weight_clip)
                
            D_total_loss += d_loss.item()
    
            # update G network
            if (i + 1) % n_d == 0:
                # G optimizer zero grads
                G_optimizer.zero_grad()
    
                # G loss
                g_z = G(z)
                d_fake = D(g_z)
                g_loss = - d_fake.mean()
    
                # G backward and step
                g_loss.backward()
                G_optimizer.step()
                
                G_total_loss += g_loss.item()
        
        return D_total_loss / len(trainloader), G_total_loss * n_d / len(trainloader)
    
    def run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, latent_dim, n_d, weight_clip):
        d_loss_hist = []
        g_loss_hist = []
    
        for epoch in range(n_epochs):
            d_loss, g_loss = wgan_train(trainloader, G, D, G_optimizer, D_optimizer, device, 
                                   z_dim=latent_dim, n_d=n_d, weight_clip=weight_clip)
            print('Epoch {}: Train D loss: {:.4f}, G loss: {:.4f}'.format(epoch, d_loss, g_loss))
    
            d_loss_hist.append(d_loss)
            g_loss_hist.append(g_loss)
    
            if epoch == 0 or (epoch + 1) % 10 == 0:
                visualize_results(G, device, latent_dim) 
        
        return d_loss_hist, g_loss_hist
    

    接下来让我们使用写好的run_wgan来跑我们的家具(椅子)数据集,看看效果如何.

    # hyper params
    
    # z dim
    latent_dim = 100
    
    # image size and channel
    image_size=32
    image_channel=3
    
    # Adam lr and betas
    learning_rate = 0.0002
    betas = (0.5, 0.999)
    
    # epochs and batch size
    n_epochs = 300
    batch_size = 32
    
    # n_d: the number of iterations of D update per G update iteration
    n_d = 2
    weight_clip=0.01
    
    # device : cpu or cuda:0/1/2/3
    device = torch.device('cuda:0')
    
    # mnist dataset and dataloader
    train_dataset = load_furniture_data()
    trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    
    # G and D model, use DCGAN, note that sigmoid is removed in D
    G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
    D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)
    
    # G and D optimizer, use Adam or SGD
    G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
    D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)
    
    d_loss_hist, g_loss_hist = run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, 
                                        latent_dim, n_d, weight_clip)
    
    Epoch 0: Train D loss: -0.0106, G loss: -0.0003
    
    image
    Epoch 1: Train D loss: -0.0576, G loss: 0.0163
    Epoch 2: Train D loss: -0.1321, G loss: 0.0897
    Epoch 3: Train D loss: -0.2723, G loss: 0.1958
    Epoch 4: Train D loss: -0.4514, G loss: 0.2948
    Epoch 5: Train D loss: -0.6250, G loss: 0.3647
    Epoch 6: Train D loss: -0.7757, G loss: 0.4329
    Epoch 7: Train D loss: -0.7672, G loss: 0.4643
    Epoch 8: Train D loss: -0.6148, G loss: 0.4314
    Epoch 9: Train D loss: -0.6224, G loss: 0.4193
    
    image
    Epoch 10: Train D loss: -0.7804, G loss: 0.4699
    Epoch 11: Train D loss: -0.6644, G loss: 0.4546
    Epoch 12: Train D loss: -0.6075, G loss: 0.4116
    Epoch 13: Train D loss: -0.6073, G loss: 0.4478
    Epoch 14: Train D loss: -0.6728, G loss: 0.4871
    Epoch 15: Train D loss: -0.6588, G loss: 0.4808
    Epoch 16: Train D loss: -0.7344, G loss: 0.4943
    Epoch 17: Train D loss: -0.6334, G loss: 0.4702
    Epoch 18: Train D loss: -0.6585, G loss: 0.4845
    Epoch 19: Train D loss: -0.6050, G loss: 0.4522
    

    ......

    image
    Epoch 280: Train D loss: -0.3420, G loss: 0.2176
    Epoch 281: Train D loss: -0.3566, G loss: 0.2435
    Epoch 282: Train D loss: -0.3164, G loss: 0.2247
    Epoch 283: Train D loss: -0.3413, G loss: 0.2615
    Epoch 284: Train D loss: -0.3329, G loss: 0.2564
    Epoch 285: Train D loss: -0.3325, G loss: 0.2060
    Epoch 286: Train D loss: -0.3658, G loss: 0.2411
    Epoch 287: Train D loss: -0.3306, G loss: 0.2545
    Epoch 288: Train D loss: -0.3219, G loss: 0.2016
    Epoch 289: Train D loss: -0.3500, G loss: 0.2295
    
    image
    Epoch 290: Train D loss: -0.3106, G loss: 0.2088
    Epoch 291: Train D loss: -0.3219, G loss: 0.1998
    Epoch 292: Train D loss: -0.3572, G loss: 0.2716
    Epoch 293: Train D loss: -0.3290, G loss: 0.2812
    Epoch 294: Train D loss: -0.3273, G loss: 0.2141
    Epoch 295: Train D loss: -0.3324, G loss: 0.2854
    Epoch 296: Train D loss: -0.3222, G loss: 0.2421
    Epoch 297: Train D loss: -0.3475, G loss: 0.2820
    Epoch 298: Train D loss: -0.3196, G loss: 0.2251
    Epoch 299: Train D loss: -0.3290, G loss: 0.2239
    
    image

    WGAN的原理我们知道,D_loss的相反数可以表示生成数据分布与真实分布的Wasserstein距离,其数值越小,表明两个分布越相似,GAN训练得越好.它的值给我们训练GAN提供了一个指标.

    运行下方代码观察wgan的loss曲线,可以看到,总体上,D_loss的相反数随着epoch数增加逐渐下降,同时生成的数据也越来越逼近真实数据,这与wgan的原理是相符合的.

    loss_plot(d_loss_hist, g_loss_hist)
    
    image

    接下来运行下面两个cell的代码,集中展示wgan的参数分布.

    from utils import show_weights_hist
    def show_d_params(D):
        plist = []
        for params in D.parameters():
            plist.extend(params.cpu().data.view(-1).numpy())
        show_weights_hist(plist)
    
    show_d_params(D)
    
    /opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6571: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
      warnings.warn("The 'normed' kwarg is deprecated, and has been "
    
    image

    可以看到,参数都被截断在[-c, c]之间,大部分参数集中在-c和c附近.

    作业:

    尝试使用n_d设置为5, 3, 1等,再次训练wGAN,n_d为多少时的结果最好?

    答:

    When n_d= 5, we will find that the details of the generated graphics are much clearer than the other two. At this time, every five times D is trained, G is trained again. Therefore, the iterative update of G is based on a better discriminator. This can significantly improve the performance without updating G every time.

    n_d = 1
    
    # G and D model, use DCGAN, note that sigmoid is removed in D
    G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
    D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)
    
    # G and D optimizer, use Adam or SGD
    G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
    D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)
    
    d_loss_hist, g_loss_hist = run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, 
                                        latent_dim, n_d, weight_clip)
    
    loss_plot(d_loss_hist, g_loss_hist)
    
    Epoch 0: Train D loss: -0.0251, G loss: -0.0089
    
    image
    Epoch 1: Train D loss: -0.0200, G loss: -0.0058
    Epoch 2: Train D loss: -0.0403, G loss: 0.0151
    Epoch 3: Train D loss: -0.0840, G loss: 0.0692
    Epoch 4: Train D loss: -0.1110, G loss: 0.1149
    Epoch 5: Train D loss: -0.0798, G loss: 0.0653
    Epoch 6: Train D loss: -0.0668, G loss: 0.0619
    Epoch 7: Train D loss: -0.0763, G loss: 0.0924
    Epoch 8: Train D loss: -0.1395, G loss: 0.1376
    Epoch 9: Train D loss: -0.1790, G loss: 0.1760
    
    image
    Epoch 10: Train D loss: -0.1733, G loss: 0.1778
    Epoch 11: Train D loss: -0.1643, G loss: 0.2132
    Epoch 12: Train D loss: -0.2438, G loss: 0.2327
    Epoch 13: Train D loss: -0.2688, G loss: 0.2631
    Epoch 14: Train D loss: -0.2538, G loss: 0.2624
    Epoch 15: Train D loss: -0.1750, G loss: 0.1571
    Epoch 16: Train D loss: -0.2005, G loss: 0.1801
    Epoch 17: Train D loss: -0.2626, G loss: 0.1983
    Epoch 18: Train D loss: -0.2573, G loss: 0.2271
    Epoch 19: Train D loss: -0.2479, G loss: 0.2566
    
    image
    Epoch 20: Train D loss: -0.1754, G loss: 0.2312
    Epoch 21: Train D loss: -0.2361, G loss: 0.2213
    Epoch 22: Train D loss: -0.4678, G loss: 0.3198
    Epoch 23: Train D loss: -0.3996, G loss: 0.3100
    Epoch 24: Train D loss: -0.4355, G loss: 0.3225
    Epoch 25: Train D loss: -0.4151, G loss: 0.3199
    Epoch 26: Train D loss: -0.3595, G loss: 0.3087
    Epoch 27: Train D loss: -0.4016, G loss: 0.3302
    Epoch 28: Train D loss: -0.3243, G loss: 0.2787
    Epoch 29: Train D loss: -0.2890, G loss: 0.2380
    
    image
    Epoch 30: Train D loss: -0.1935, G loss: 0.1274
    Epoch 31: Train D loss: -0.4133, G loss: 0.3306
    Epoch 32: Train D loss: -0.2924, G loss: 0.2732
    Epoch 33: Train D loss: -0.3298, G loss: 0.3033
    Epoch 34: Train D loss: -0.3138, G loss: 0.2745
    Epoch 35: Train D loss: -0.4105, G loss: 0.3589
    Epoch 36: Train D loss: -0.2292, G loss: 0.2321
    Epoch 37: Train D loss: -0.4472, G loss: 0.3496
    Epoch 38: Train D loss: -0.3871, G loss: 0.3079
    Epoch 39: Train D loss: -0.3574, G loss: 0.3200
    
    image
    Epoch 40: Train D loss: -0.4521, G loss: 0.3567
    Epoch 41: Train D loss: -0.3822, G loss: 0.3030
    Epoch 42: Train D loss: -0.3556, G loss: 0.3106
    Epoch 43: Train D loss: -0.4338, G loss: 0.3545
    Epoch 44: Train D loss: -0.4273, G loss: 0.3315
    Epoch 45: Train D loss: -0.4402, G loss: 0.3320
    Epoch 46: Train D loss: -0.3696, G loss: 0.3154
    Epoch 47: Train D loss: -0.4215, G loss: 0.3088
    Epoch 48: Train D loss: -0.4023, G loss: 0.3035
    Epoch 49: Train D loss: -0.4106, G loss: 0.3108
    
    image
    Epoch 50: Train D loss: -0.4090, G loss: 0.3000
    Epoch 51: Train D loss: -0.3908, G loss: 0.3033
    Epoch 52: Train D loss: -0.3929, G loss: 0.3011
    Epoch 53: Train D loss: -0.3975, G loss: 0.2898
    Epoch 54: Train D loss: -0.3904, G loss: 0.3115
    Epoch 55: Train D loss: -0.3649, G loss: 0.2771
    Epoch 56: Train D loss: -0.3763, G loss: 0.2938
    Epoch 57: Train D loss: -0.3817, G loss: 0.3170
    Epoch 58: Train D loss: -0.3438, G loss: 0.2766
    Epoch 59: Train D loss: -0.3707, G loss: 0.3001
    

    ......

    image
    Epoch 280: Train D loss: -0.1990, G loss: 0.1610
    Epoch 281: Train D loss: -0.2045, G loss: 0.2129
    Epoch 282: Train D loss: -0.1959, G loss: 0.1990
    Epoch 283: Train D loss: -0.1795, G loss: 0.1501
    Epoch 284: Train D loss: -0.1925, G loss: 0.1886
    Epoch 285: Train D loss: -0.1922, G loss: 0.1648
    Epoch 286: Train D loss: -0.1990, G loss: 0.1833
    Epoch 287: Train D loss: -0.1987, G loss: 0.1909
    Epoch 288: Train D loss: -0.2003, G loss: 0.1681
    Epoch 289: Train D loss: -0.2046, G loss: 0.1724
    
    image
    Epoch 290: Train D loss: -0.2004, G loss: 0.1841
    Epoch 291: Train D loss: -0.2178, G loss: 0.1841
    Epoch 292: Train D loss: -0.1769, G loss: 0.1601
    Epoch 293: Train D loss: -0.1852, G loss: 0.1555
    Epoch 294: Train D loss: -0.1895, G loss: 0.1879
    Epoch 295: Train D loss: -0.1996, G loss: 0.1534
    Epoch 296: Train D loss: -0.1944, G loss: 0.1817
    Epoch 297: Train D loss: -0.1926, G loss: 0.1857
    Epoch 298: Train D loss: -0.2057, G loss: 0.1622
    Epoch 299: Train D loss: -0.2130, G loss: 0.1960
    
    image image
    n_d = 3
    
    # G and D model, use DCGAN, note that sigmoid is removed in D
    G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
    D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)
    
    # G and D optimizer, use Adam or SGD
    G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
    D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)
    
    d_loss_hist, g_loss_hist = run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, 
                                        latent_dim, n_d, weight_clip)
    
    loss_plot(d_loss_hist, g_loss_hist)
    
    Epoch 0: Train D loss: 0.0069, G loss: 0.0021
    
    image
    Epoch 1: Train D loss: -0.0791, G loss: 0.0306
    Epoch 2: Train D loss: -0.1852, G loss: 0.1159
    Epoch 3: Train D loss: -0.3618, G loss: 0.2186
    Epoch 4: Train D loss: -0.4753, G loss: 0.2786
    Epoch 5: Train D loss: -0.6302, G loss: 0.3484
    Epoch 6: Train D loss: -0.7498, G loss: 0.3949
    Epoch 7: Train D loss: -0.8587, G loss: 0.4415
    Epoch 8: Train D loss: -0.9714, G loss: 0.4878
    Epoch 9: Train D loss: -1.0270, G loss: 0.5135
    
    image
    Epoch 10: Train D loss: -1.0649, G loss: 0.5341
    Epoch 11: Train D loss: -0.9526, G loss: 0.5177
    Epoch 12: Train D loss: -0.8284, G loss: 0.4603
    Epoch 13: Train D loss: -0.9364, G loss: 0.5148
    Epoch 14: Train D loss: -1.0217, G loss: 0.5523
    Epoch 15: Train D loss: -0.9515, G loss: 0.4988
    Epoch 16: Train D loss: -0.9435, G loss: 0.5272
    Epoch 17: Train D loss: -0.8170, G loss: 0.4336
    Epoch 18: Train D loss: -0.8701, G loss: 0.4690
    Epoch 19: Train D loss: -0.9068, G loss: 0.5018
    
    image
    Epoch 20: Train D loss: -0.8681, G loss: 0.4756
    Epoch 21: Train D loss: -0.8347, G loss: 0.4296
    Epoch 22: Train D loss: -0.8639, G loss: 0.4728
    Epoch 23: Train D loss: -0.7830, G loss: 0.4581
    Epoch 24: Train D loss: -0.7746, G loss: 0.4464
    Epoch 25: Train D loss: -0.8700, G loss: 0.4785
    Epoch 26: Train D loss: -0.8557, G loss: 0.4636
    Epoch 27: Train D loss: -0.7885, G loss: 0.4442
    Epoch 28: Train D loss: -0.7860, G loss: 0.4482
    Epoch 29: Train D loss: -0.7841, G loss: 0.4317
    

    ......

    image
    Epoch 260: Train D loss: -0.4257, G loss: 0.2434
    Epoch 261: Train D loss: -0.3834, G loss: 0.1874
    Epoch 262: Train D loss: -0.4639, G loss: 0.3219
    Epoch 263: Train D loss: -0.4426, G loss: 0.2938
    Epoch 264: Train D loss: -0.4858, G loss: 0.2983
    Epoch 265: Train D loss: -0.4438, G loss: 0.3005
    Epoch 266: Train D loss: -0.4347, G loss: 0.2685
    Epoch 267: Train D loss: -0.4632, G loss: 0.2412
    Epoch 268: Train D loss: -0.4347, G loss: 0.3064
    Epoch 269: Train D loss: -0.4426, G loss: 0.3141
    
    image
    Epoch 270: Train D loss: -0.4450, G loss: 0.2698
    Epoch 271: Train D loss: -0.4017, G loss: 0.1301
    Epoch 272: Train D loss: -0.4728, G loss: 0.2955
    Epoch 273: Train D loss: -0.4224, G loss: 0.1896
    Epoch 274: Train D loss: -0.4218, G loss: 0.2128
    Epoch 275: Train D loss: -0.4780, G loss: 0.2925
    Epoch 276: Train D loss: -0.4397, G loss: 0.2963
    Epoch 277: Train D loss: -0.4463, G loss: 0.2299
    Epoch 278: Train D loss: -0.4356, G loss: 0.3044
    Epoch 279: Train D loss: -0.4483, G loss: 0.2750
    
    image
    Epoch 280: Train D loss: -0.4312, G loss: 0.2676
    Epoch 281: Train D loss: -0.4409, G loss: 0.2906
    Epoch 282: Train D loss: -0.4464, G loss: 0.2933
    Epoch 283: Train D loss: -0.4409, G loss: 0.1911
    Epoch 284: Train D loss: -0.4241, G loss: 0.1807
    Epoch 285: Train D loss: -0.4174, G loss: 0.2371
    Epoch 286: Train D loss: -0.4385, G loss: 0.2776
    Epoch 287: Train D loss: -0.4441, G loss: 0.3239
    Epoch 288: Train D loss: -0.3909, G loss: 0.1265
    Epoch 289: Train D loss: -0.4617, G loss: 0.3183
    
    image
    Epoch 290: Train D loss: -0.4374, G loss: 0.2967
    Epoch 291: Train D loss: -0.4362, G loss: 0.2297
    Epoch 292: Train D loss: -0.4295, G loss: 0.2365
    Epoch 293: Train D loss: -0.4244, G loss: 0.2824
    Epoch 294: Train D loss: -0.4617, G loss: 0.3120
    Epoch 295: Train D loss: -0.3845, G loss: 0.1841
    Epoch 296: Train D loss: -0.4179, G loss: 0.3275
    Epoch 297: Train D loss: -0.3968, G loss: 0.2162
    Epoch 298: Train D loss: -0.4360, G loss: 0.2535
    Epoch 299: Train D loss: -0.4168, G loss: 0.1963
    
    image image
    n_d = 5
    
    # G and D model, use DCGAN, note that sigmoid is removed in D
    G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
    D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)
    
    # G and D optimizer, use Adam or SGD
    G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
    D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)
    
    d_loss_hist, g_loss_hist = run_wgan(trainloader, G, D, G_optimizer, D_optimizer, n_epochs, device, 
                                        latent_dim, n_d, weight_clip)
    
    loss_plot(d_loss_hist, g_loss_hist)
    
    Epoch 0: Train D loss: -0.0630, G loss: 0.0124
    
    image
    Epoch 1: Train D loss: -0.1226, G loss: 0.0588
    Epoch 2: Train D loss: -0.2772, G loss: 0.1625
    Epoch 3: Train D loss: -0.4880, G loss: 0.2672
    Epoch 4: Train D loss: -0.6543, G loss: 0.3397
    Epoch 5: Train D loss: -0.7899, G loss: 0.4041
    Epoch 6: Train D loss: -0.8909, G loss: 0.4511
    Epoch 7: Train D loss: -0.9759, G loss: 0.4947
    Epoch 8: Train D loss: -1.0392, G loss: 0.5194
    Epoch 9: Train D loss: -1.1024, G loss: 0.5463
    
    image
    Epoch 10: Train D loss: -1.1374, G loss: 0.5677
    Epoch 11: Train D loss: -1.1750, G loss: 0.5820
    Epoch 12: Train D loss: -1.2188, G loss: 0.5988
    Epoch 13: Train D loss: -1.2543, G loss: 0.6115
    Epoch 14: Train D loss: -1.2656, G loss: 0.6200
    Epoch 15: Train D loss: -1.2664, G loss: 0.6195
    Epoch 16: Train D loss: -1.2058, G loss: 0.6176
    Epoch 17: Train D loss: -1.2978, G loss: 0.6354
    Epoch 18: Train D loss: -1.3151, G loss: 0.6405
    Epoch 19: Train D loss: -1.3089, G loss: 0.6427
    
    image
    Epoch 20: Train D loss: -1.2956, G loss: 0.6347
    Epoch 21: Train D loss: -1.2645, G loss: 0.6462
    Epoch 22: Train D loss: -1.1193, G loss: 0.6170
    Epoch 23: Train D loss: -1.0726, G loss: 0.5990
    Epoch 24: Train D loss: -1.2008, G loss: 0.6434
    Epoch 25: Train D loss: -1.2399, G loss: 0.6336
    Epoch 26: Train D loss: -1.2748, G loss: 0.6413
    Epoch 27: Train D loss: -1.2918, G loss: 0.6473
    Epoch 28: Train D loss: -1.3105, G loss: 0.6513
    Epoch 29: Train D loss: -1.3160, G loss: 0.6507
    
    image
    Epoch 30: Train D loss: -1.2992, G loss: 0.6479
    Epoch 31: Train D loss: -1.0788, G loss: 0.6045
    Epoch 32: Train D loss: -1.1036, G loss: 0.5824
    Epoch 33: Train D loss: -1.1215, G loss: 0.6005
    Epoch 34: Train D loss: -0.7472, G loss: 0.5509
    Epoch 35: Train D loss: -1.1456, G loss: 0.5953
    Epoch 36: Train D loss: -1.1316, G loss: 0.6104
    Epoch 37: Train D loss: -1.1104, G loss: 0.6178
    Epoch 38: Train D loss: -0.9294, G loss: 0.5449
    Epoch 39: Train D loss: -0.8962, G loss: 0.5298
    
    image
    Epoch 40: Train D loss: -0.9316, G loss: 0.5615
    Epoch 41: Train D loss: -1.0236, G loss: 0.5511
    Epoch 42: Train D loss: -1.0571, G loss: 0.5896
    Epoch 43: Train D loss: -1.1424, G loss: 0.5962
    Epoch 44: Train D loss: -1.1372, G loss: 0.5895
    Epoch 45: Train D loss: -1.0107, G loss: 0.5562
    Epoch 46: Train D loss: -1.0414, G loss: 0.5619
    Epoch 47: Train D loss: -1.0015, G loss: 0.5283
    Epoch 48: Train D loss: -1.0139, G loss: 0.5739
    Epoch 49: Train D loss: -1.0580, G loss: 0.5779
    

    ......

    image
    Epoch 280: Train D loss: -0.5398, G loss: 0.2564
    Epoch 281: Train D loss: -0.5926, G loss: 0.2978
    Epoch 282: Train D loss: -0.5837, G loss: 0.3241
    Epoch 283: Train D loss: -0.5839, G loss: 0.3225
    Epoch 284: Train D loss: -0.5587, G loss: 0.1916
    Epoch 285: Train D loss: -0.5656, G loss: 0.3763
    Epoch 286: Train D loss: -0.5593, G loss: 0.3103
    Epoch 287: Train D loss: -0.5779, G loss: 0.2773
    Epoch 288: Train D loss: -0.5813, G loss: 0.3878
    Epoch 289: Train D loss: -0.6136, G loss: 0.4114
    
    image
    Epoch 290: Train D loss: -0.5437, G loss: 0.3981
    Epoch 291: Train D loss: -0.5895, G loss: 0.4018
    Epoch 292: Train D loss: -0.5595, G loss: 0.3615
    Epoch 293: Train D loss: -0.5514, G loss: 0.2601
    Epoch 294: Train D loss: -0.5468, G loss: 0.3513
    Epoch 295: Train D loss: -0.6066, G loss: 0.3609
    Epoch 296: Train D loss: -0.5875, G loss: 0.3668
    Epoch 297: Train D loss: -0.5536, G loss: 0.2995
    Epoch 298: Train D loss: -0.5507, G loss: 0.2963
    Epoch 299: Train D loss: -0.5845, G loss: 0.2848
    
    image image

    WGAN-GP(improved wgan)

    在WGAN中,需要进行截断, 在实验中发现: 对于比较深的WAGN,它不容易收敛。

    大致原因如下:

    1. 实验发现最后大多数的权重都在-c 和c上,这就意味了大部分权重只有两个可能数,这太简单了,作为一个深度神经网络来说,这实在是对它强大的拟合能力的浪费.
    2. 实验发现容易导致梯度消失或梯度爆炸。判别器是一个多层网络,如果把clip的值设得稍微小了一点,每经过一层网络,梯度就变小一点点,多层之后就会指数衰减;反之,则容易导致梯度爆炸.

    所以WGAN-GP使用了Gradient penalty(梯度惩罚)来代替clip.
    因为Lipschitz限制是要求判别器的梯度不超过K,所以可以直接使用一个loss term来实现这一点,所以改进后D的优化目标改进为如下:


    image

    下面是WGAN-GP的具体代码实现,同WGAN,我们也只实现了他的训练代码,而模型我们直接使用DCGAN的模型.

    import torch.autograd as autograd
    
    def wgan_gp_train(trainloader, G, D, G_optimizer, D_optimizer, device, z_dim, lambda_=10, n_d=2):
        
        D.train()
        G.train()
        
        D_total_loss = 0
        G_total_loss = 0
        
        
        for i, (x, _) in enumerate(trainloader):
            x = x.to(device)
    
            # update D network
            # D optimizer zero grads
            D_optimizer.zero_grad()
            
            # D real loss from real images
            d_real = D(x)
            d_real_loss = - d_real.mean()
            
            # D fake loss from fake images generated by G
            z = torch.rand(x.size(0), z_dim).to(device)
            g_z = G(z)
            d_fake = D(g_z)
            d_fake_loss = d_fake.mean()
            
            # D gradient penalty
            
            #   a random number epsilon
            epsilon = torch.rand(x.size(0), 1, 1, 1).cuda()
            x_hat = epsilon * x + (1 - epsilon) * g_z
            x_hat.requires_grad_(True)
    
            y_hat = D(x_hat)
            #   computes the sum of gradients of y_hat with regard to x_hat
            gradients = autograd.grad(outputs=y_hat, inputs=x_hat, grad_outputs=torch.ones(y_hat.size()).cuda(),
                                      create_graph=True, retain_graph=True, only_inputs=True)[0]
            #   computes gradientpenalty
            gradient_penalty =  torch.mean((gradients.view(gradients.size()[0], -1).norm(p=2, dim=1) - 1) ** 2)
            
            # D backward and step
            d_loss = d_real_loss + d_fake_loss + lambda_ * gradient_penalty
            d_loss.backward()
            D_optimizer.step()
            
                
            D_total_loss += d_loss.item()
    
            # update G network
            # G optimizer zero grads
            if (i + 1) % n_d == 0:
                G_optimizer.zero_grad()
    
                # G loss
                g_z = G(z)
                d_fake = D(g_z)
                g_loss = - d_fake.mean()
    
                # G backward and step
                g_loss.backward()
                G_optimizer.step()
                
                G_total_loss += g_loss.item()
        
        return D_total_loss / len(trainloader), G_total_loss * n_d / len(trainloader)
    
    # hyper params
    
    # z dim
    latent_dim = 100
    
    # image size and channel
    image_size=32
    image_channel=3
    
    # Adam lr and betas
    learning_rate = 0.0002
    betas = (0.5, 0.999)
    
    # epochs and batch size
    n_epochs = 300
    batch_size = 32
    
    # device : cpu or cuda:0/1/2/3
    device = torch.device('cuda:0')
    
    # n_d: train D
    n_d = 2
    lambda_ = 10
    
    # mnist dataset and dataloader
    train_dataset = load_furniture_data()
    trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    
    # G and D model, use DCGAN, note that sigmoid is removed in D
    G = DCGenerator(image_size=image_size, latent_dim=latent_dim, output_channel=image_channel).to(device)
    D = DCDiscriminator(image_size=image_size, input_channel=image_channel, sigmoid=False).to(device)
    
    # G and D optimizer, use Adam or SGD
    G_optimizer = optim.Adam(G.parameters(), lr=learning_rate, betas=betas)
    D_optimizer = optim.Adam(D.parameters(), lr=learning_rate, betas=betas)
    
    d_loss_hist = []
    g_loss_hist = []
    
    for epoch in range(n_epochs):
        d_loss, g_loss = wgan_gp_train(trainloader, G, D, G_optimizer, D_optimizer, device, 
                               z_dim=latent_dim, lambda_=lambda_, n_d=n_d)
        print('Epoch {}: Train D loss: {:.4f}, G loss: {:.4f}'.format(epoch, d_loss, g_loss))
        
        d_loss_hist.append(d_loss)
        g_loss_hist.append(g_loss)
        
        if epoch == 0 or (epoch + 1) % 10 == 0:
            visualize_results(G, device, latent_dim)
    
    Epoch 0: Train D loss: 1.1936, G loss: 2.7239
    
    image
    Epoch 1: Train D loss: -8.1520, G loss: 8.7105
    Epoch 2: Train D loss: -14.5335, G loss: 15.9505
    Epoch 3: Train D loss: -22.4751, G loss: 25.4797
    Epoch 4: Train D loss: -25.5143, G loss: 26.5167
    Epoch 5: Train D loss: -20.2827, G loss: 20.9673
    Epoch 6: Train D loss: -15.2205, G loss: 17.7352
    Epoch 7: Train D loss: -15.0674, G loss: 17.9785
    Epoch 8: Train D loss: -14.2372, G loss: 19.3913
    Epoch 9: Train D loss: -13.6457, G loss: 19.7493
    
    image
    Epoch 10: Train D loss: -12.9571, G loss: 20.5028
    Epoch 11: Train D loss: -12.0761, G loss: 20.7169
    Epoch 12: Train D loss: -12.5201, G loss: 21.4914
    Epoch 13: Train D loss: -12.7979, G loss: 20.8781
    Epoch 14: Train D loss: -11.8754, G loss: 21.4311
    Epoch 15: Train D loss: -12.0360, G loss: 22.1997
    Epoch 16: Train D loss: -12.3443, G loss: 21.8415
    Epoch 17: Train D loss: -12.4492, G loss: 22.3451
    Epoch 18: Train D loss: -12.4704, G loss: 23.1174
    Epoch 19: Train D loss: -12.0635, G loss: 24.3485
    
    image
    Epoch 20: Train D loss: -11.5159, G loss: 23.7863
    Epoch 21: Train D loss: -10.8694, G loss: 23.1774
    Epoch 22: Train D loss: -11.7171, G loss: 23.6735
    Epoch 23: Train D loss: -12.1799, G loss: 24.5387
    Epoch 24: Train D loss: -11.2967, G loss: 24.4599
    Epoch 25: Train D loss: -9.2917, G loss: 25.2789
    Epoch 26: Train D loss: -11.7295, G loss: 24.9656
    Epoch 27: Train D loss: -11.9890, G loss: 25.1133
    Epoch 28: Train D loss: -11.0419, G loss: 26.9544
    Epoch 29: Train D loss: -11.4329, G loss: 27.7644
    

    ......

    image
    Epoch 280: Train D loss: -5.3110, G loss: 45.2193
    Epoch 281: Train D loss: -5.3459, G loss: 46.8995
    Epoch 282: Train D loss: -5.4012, G loss: 45.6606
    Epoch 283: Train D loss: -5.6629, G loss: 47.7304
    Epoch 284: Train D loss: -6.0067, G loss: 47.8233
    Epoch 285: Train D loss: -5.9803, G loss: 45.2547
    Epoch 286: Train D loss: -5.6341, G loss: 48.4564
    Epoch 287: Train D loss: -6.2482, G loss: 47.1421
    Epoch 288: Train D loss: -5.5349, G loss: 46.8103
    Epoch 289: Train D loss: -6.0081, G loss: 47.4786
    
    image
    Epoch 290: Train D loss: -6.1895, G loss: 49.2255
    Epoch 291: Train D loss: -5.8228, G loss: 46.5874
    Epoch 292: Train D loss: -6.7193, G loss: 50.4547
    Epoch 293: Train D loss: -6.9497, G loss: 49.2031
    Epoch 294: Train D loss: -6.4045, G loss: 49.5813
    Epoch 295: Train D loss: -6.5181, G loss: 49.3917
    Epoch 296: Train D loss: -5.3349, G loss: 49.1568
    Epoch 297: Train D loss: -6.2215, G loss: 48.8781
    Epoch 298: Train D loss: -6.0418, G loss: 50.5765
    Epoch 299: Train D loss: -5.4949, G loss: 49.0278
    
    image

    同理,观察loss曲线和D上的参数分布.

    loss_plot(d_loss_hist, g_loss_hist)
    
    image
    show_d_params(D)
    
    image

    作业:

    观察WGAN和WGAN-GP生成器生成的图片效果,它们在相同epoch时生成的图片效果(或者说生成图片达到效果所需要epoch数量),它们的loss曲线以及D的参数分布,说说有什么不同?

    :

    1. WGAN-GP converges faster under the same epoch;
    2. Loss curve of WGAN have a more stable but slower convergence, while the convergence of WGAN-GP is faster but still fluctuates after convergence;
    3. WGAN adopts the weight pruning strategy to forcibly satisfy that the gradient of each point within the defined domain is constant, which leads to that the parameters in the training process will always be truncated after updating.
    4. WGAN-GP adopts the strategy of gradient penalty, which guarantees that the L2 norm of the gradient relative to the original input should be bound, solving the problem of the explosion of the disappearing gradient of the training gradient.

    相关文章

      网友评论

          本文标题:深度学习笔记(十三)—— GAN-2

          本文链接:https://www.haomeiwen.com/subject/qhrkdctx.html