美文网首页
深度学习笔记(十六)—— RNN-1

深度学习笔记(十六)—— RNN-1

作者: Nino_Lau | 来源:发表于2019-08-02 09:38 被阅读0次

    0. Introduction

    本次实验设计主要修改自Pytorch官方提供的教程,在官方的教程上进行了整合。

    0.1 Experimental content and requirements

    本次实验内容主要分为词语分类和词语生成两大部分,具体要求如下:

    1. 体验语义分割网络的运行过程,和基本的代码结构,结合理论课的内容,加深对RNN的思考和理解
    2. 独立完成实验指导书中提出的问题(简要回答)
    3. 按照实验指导书的引导,填充缺失部分的代码,让程序顺利地运转起来
    4. 坚持独立完成,禁止抄袭
    5. 实验结束后,将整个文件夹下载下来(注意保留程序运行结果),打包上传到超算课堂网站中(统一使用zip格式压缩)。

    0.2 Recommended Reading

    These are all good additions to the understanding of RNN.:

    0.3 Setup CUDA

    我们在没有CUDA的情况下跑整个实验,但是有CUDA的话自然会更好,首先,检查和配置CUDA环境。

    import torch
    is_cuda = False
    print(is_cuda)
    
    False
    

    选着一张空闲的卡

    id = 0
    torch.cuda.set_device(id)
    print( torch.cuda.current_device() )
    
    0
    

    1. Classifying Names with a Character-Level RNN


    We will be building and training a basic character-level RNN to classify
    words. A character-level RNN reads words as a series of characters -
    outputting a prediction and "hidden state" at each step, feeding its
    previous hidden state into each next step. We take the final prediction
    to be the output, i.e. which class the word belongs to.

    Specifically, we'll train on a few thousand surnames from 18 languages
    of origin, and predict which language a name is from based on the
    spelling:

    1.1 Preparing the Data

    Download the data from here and extract it to the current directory.

    Included in the data/names directory are 18 text files named as
    "[Language].txt". Each file contains a bunch of names, one name per
    line, mostly romanized (but we still need to convert from Unicode to
    ASCII).

    We'll end up with a dictionary of lists of names per language,
    {language: [names ...]}. The generic variables "category" and "line"
    (for language and name in our case) are used for later extensibility.

    from __future__ import unicode_literals, print_function, division
    from io import open
    import glob
    import os
    
    def findFiles(path): return glob.glob(path)
    
    print(findFiles('data/names/*.txt'))
    
    import unicodedata
    import string
    
    all_letters = string.ascii_letters + " .,;'"
    n_letters = len(all_letters)
    
    # Turn a Unicode string to plain ASCII, thanks to https://stackoverflow.com/a/518232/2809427
    def unicodeToAscii(s):
        return ''.join(
            c for c in unicodedata.normalize('NFD', s)
            if unicodedata.category(c) != 'Mn'
            and c in all_letters
        )
    
    print(unicodeToAscii('Ślusàrski'))
    
    # Build the category_lines dictionary, a list of names per language
    all_categories = []
    category_lines = {}
    
    # Split it into training set and validation set
    training_lines = {}
    validation_lines = {}
    
    # Read a file and split into lines
    def readLines(filename):
        lines = open(filename, encoding='utf-8').read().strip().split('\n')
        return [unicodeToAscii(line) for line in lines]
    
    for filename in findFiles('data/names/*.txt'):
        category = os.path.splitext(os.path.basename(filename))[0]
        all_categories.append(category)
        lines = readLines(filename)
        category_lines[category] = lines
        
        num_of_training_set = int(len(lines)*0.8)
        training_lines[category]   = lines[:num_of_training_set]
        validation_lines[category] = lines[num_of_training_set:] 
    
    n_categories = len(all_categories)
    print(n_categories)
    
    ['data/names/Polish.txt', 'data/names/Italian.txt', 'data/names/Vietnamese.txt', 'data/names/Czech.txt', 'data/names/German.txt', 'data/names/Spanish.txt', 'data/names/Japanese.txt', 'data/names/Russian.txt', 'data/names/Chinese.txt', 'data/names/Korean.txt', 'data/names/Irish.txt', 'data/names/French.txt', 'data/names/Dutch.txt', 'data/names/Greek.txt', 'data/names/Arabic.txt', 'data/names/English.txt', 'data/names/Scottish.txt', 'data/names/Portuguese.txt']
    Slusarski
    18
    

    Now we have category_lines, a dictionary mapping each category
    (language) to a list of lines (names). We also kept track of
    all_categories (just a list of languages) and n_categories for
    later reference.

    print(category_lines['Italian'][:5])
    
    ['Abandonato', 'Abatangelo', 'Abatantuono', 'Abate', 'Abategiovanni']
    

    1.2 Turning Names into Tensors

    Now that we have all the names organized, we need to turn them into
    Tensors to make any use of them.

    To represent a single letter, we use a "one-hot vector" of size
    <1 x n_letters>. A one-hot vector is filled with 0s except for a 1
    at index of the current letter, e.g. "b" = <0 1 0 0 0 ...>.

    To make a word we join a bunch of those into a 2D matrix
    <line_length x 1 x n_letters>.

    That extra 1 dimension is because PyTorch assumes everything is in
    batches - we're just using a batch size of 1 here.

    import torch
    
    # Find letter index from all_letters, e.g. "a" = 0
    def letterToIndex(letter):
        return all_letters.find(letter)
    
    # Just for demonstration, turn a letter into a <1 x n_letters> Tensor
    def letterToTensor(letter):
        tensor = torch.zeros(1, n_letters)
        tensor[0][letterToIndex(letter)] = 1
        if is_cuda:
            tensor = tensor.cuda()
        return tensor
    
    # Turn a line into a <line_length x 1 x n_letters>,
    # or an array of one-hot letter vectors
    def lineToTensor(line):
        tensor = torch.zeros(len(line), 1, n_letters)
        for li, letter in enumerate(line):
            tensor[li][0][letterToIndex(letter)] = 1
        if is_cuda:
            tensor = tensor.cuda()
        return tensor
    
    # Tensor here, someone else may call it vector. 
    print(letterToTensor('J'))
    
    print(lineToTensor('Jones').size())
    
    tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
             0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
             0., 0., 0.]])
    torch.Size([5, 1, 57])
    

    1.3 Creating the Network

    Before autograd, creating a recurrent neural network in Torch involved
    cloning the parameters of a layer over several timesteps. The layers
    held hidden state and gradients which are now entirely handled by the
    graph itself. This means you can implement a RNN in a very "pure" way,
    as regular feed-forward layers.

    Inside the forward function, we need to Loop

    <img src="https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" width = "50%" />

    import torch.nn as nn
    
    class BaseRNN(nn.Module):
        def __init__(self, input_size, hidden_size, output_size):
            super(BaseRNN, self).__init__()
    
            self.hidden_size = hidden_size
    
            # input to hidden
            self.i2h = nn.Linear(input_size,  hidden_size)
            # hidden to hidden
            self.h2h = nn.Linear(hidden_size, hidden_size)
            self.activation = nn.ReLU()
            # hidden to output
            self.h2o = nn.Linear(hidden_size, output_size)
            
        def step(self, letter, hidden):
            i2h = self.i2h(letter)
            h2h = self.h2h(hidden)
            hidden = self.activation( h2h+i2h )
            
            output = self.h2o(hidden)
            return output, hidden
        
        def forward(self, word):
            hidden = self.initHidden()
            for i in range(word.size()[0]):
                # Only the last output will be used to predict
                output, hidden = self.step(word[i], hidden)
            return output
        
        def initHidden(self, is_cuda=True):
            if is_cuda:
                return torch.zeros(1, self.hidden_size).cuda()
            else:
                return torch.zeros(1, self.hidden_size)
    
    n_hidden = 128
    rnn = BaseRNN(n_letters, n_hidden, n_categories)
    if is_cuda:
        rnn = rnn.cuda()
    
    class DeeperRNN(nn.Module):
        def __init__(self, input_size, hidden_size, output_size):
            super(DeeperRNN, self).__init__()
            self.hidden1_size = hidden_size
            self.hidden2_size = hidden_size
            self.layer1 = BaseRNN(input_size, hidden_size, hidden_size)
            self.layer2 = BaseRNN(hidden_size, hidden_size, output_size)
            
        def step(self, letter, hidden1, hidden2):
            output1, hidden1 = self.layer1.step(letter, hidden1)
            output2, hidden2 = self.layer2.step(output1, hidden2)
            
            return output2, hidden1, hidden2
        
        def forward(self, word):
            hidden1, hidden2 = self.initHidden(False)
            for i in range(word.size()[0]):
                # Only the last output will be used to predict
                output, hidden1, hidden2 = self.step(word[i], hidden1, hidden2)
            return output
    
            
        def initHidden(self, is_cuda=True):
            if is_cuda:
                return torch.zeros(1, self.hidden1_size).cuda(), torch.zeros(1, self.hidden2_size).cuda()
            else:
                return torch.zeros(1, self.hidden1_size), torch.zeros(1, self.hidden2_size)
    n_hidden = 128
    rnn = DeeperRNN(n_letters, n_hidden, n_categories)
    rnn = rnn.cuda() if is_cuda else rnn
    

    To run a step of this network we need to pass an input (in our case, the
    Tensor for the current letter) and a previous hidden state (which we
    initialize as zeros at first). We'll get back the output (probability of
    each language) and a next hidden state (which we keep for the next
    step).

    input = letterToTensor('A')
    hidden =torch.zeros(1, n_hidden)
    hidden2 =torch.zeros(1, n_hidden)
    hidden = hidden.cuda() if is_cuda else hidden
    hidden2 = hidden2.cuda() if is_cuda else hidden2
    
    output, next_hidden, next_hidden2 = rnn.step(input, hidden,hidden2)
    print(output.shape)
    print(next_hidden.shape)
    
    torch.Size([1, 18])
    torch.Size([1, 128])
    

    For the sake of efficiency we don't want to be creating a new Tensor for
    every step, so we will use lineToTensor instead of
    letterToTensor and use slices. This could be further optimized by
    pre-computing batches of Tensors.

    Each loop in side forward function will:

    • Create input and target tensors

    • Create a zeroed initial hidden state

    • Read each letter in and

      • Keep hidden state for next letter
    • Compare final output to target

    • Back-propagate

    • Return the output and loss

    input = lineToTensor('Albert')
    print(n_hidden)
    
    output = rnn(input)
    print(output)
    print(output.shape)
    
    128
    tensor([[ 0.0510,  0.0372, -0.0115,  0.0182, -0.0876,  0.1243, -0.0355, -0.0904,
              0.0170,  0.0767, -0.0053,  0.1697, -0.0931,  0.0245, -0.0771,  0.0039,
             -0.0405, -0.0546]], grad_fn=<AddmmBackward>)
    torch.Size([1, 18])
    

    As you can see the output is a <1 x n_categories> Tensor, where
    every item is the likelihood of that category (higher is more likely).

    1.4 Training

    1.4.1 Preparing for Training

    Before going into training we should make a few helper functions. The
    first is to interpret the output of the network, which we know to be a
    likelihood of each category. We can use Tensor.topk to get the index
    of the greatest value:

    def categoryFromOutput(output):
        top_n, top_i = output.topk(1)
        category_i = top_i[0].item()
        return all_categories[category_i], category_i
    
    print(categoryFromOutput(output))
    
    ('French', 11)
    

    We will also want a quick way to get a training example (a name and its
    language):

    import random
    
    def randomChoice(l):
        return l[random.randint(0, len(l) - 1)]
    
    def randomTrainingExample():
        category = randomChoice(all_categories)
        # attention: split training set 
        line = randomChoice(training_lines[category])
        category_tensor = torch.tensor([all_categories.index(category)], dtype=torch.long)
        category_tensor = category_tensor.cuda() if is_cuda else category_tensor
        line_tensor = lineToTensor(line)
        return category, line, category_tensor, line_tensor
    
    def randomValidationExample():
        category = randomChoice(all_categories)
        # attention: split validation set
        line = randomChoice(validation_lines[category])
        category_tensor = torch.tensor([all_categories.index(category)], dtype=torch.long)
        category_tensor = category_tensor.cuda() if is_cuda else category_tensor
        line_tensor = lineToTensor(line)
        return category, line, category_tensor, line_tensor
    
    for i in range(10):
        category, line, category_tensor, line_tensor = randomTrainingExample()
        print('category =', category, '/ line =', line)
    
    category = Vietnamese / line = Dao
    category = Italian / line = Pesaro
    category = English / line = Greenleaf
    category = Italian / line = Nieri
    category = Irish / line = Bradach
    category = Russian / line = Esipovich
    category = Dutch / line = Romeijnders
    category = Vietnamese / line = Chau
    category = Portuguese / line = Castro
    category = Greek / line = Kefalas
    

    1.4.2 Training the Network

    Now all it takes to train this network is show it a bunch of examples,
    have it make guesses, and tell it if it's wrong.

    criterion = nn.CrossEntropyLoss()
    
    learning_rate = 0.005 # If you set this too high, it might explode. If too low, it might not learn
    
    def train(category_tensor, line_tensor):
        output = rnn(line_tensor)
        rnn.zero_grad()
        loss = criterion(output, category_tensor)
        loss.backward()
    
        # Add parameters' gradients to their values, multiplied by learning rate
        for p in rnn.parameters():
            if hasattr(p.grad, "data"):
                p.data.add_(-learning_rate, p.grad.data)
    
        return output, loss.item()
    

    Now we just have to run that with a bunch of examples. Since the
    train function returns both the output and loss we can print its
    guesses and also keep track of loss for plotting. Since there are 1000s
    of examples we print only every print_every examples, and take an
    average of the loss.

    import time
    import math
    
    n_iters = 100000
    print_every = 5000
    plot_every = 1000
    
    # Keep track of losses for plotting
    current_loss = 0
    all_losses = []
    
    def timeSince(since):
        now = time.time()
        s = now - since
        m = math.floor(s / 60)
        s -= m * 60
        return '%dm %ds' % (m, s)
    
    start = time.time()
    
    for iter in range(1, n_iters + 1):
        category, line, category_tensor, line_tensor = randomTrainingExample()
        output, loss = train(category_tensor, line_tensor)
        current_loss += loss
    
        # Print iter number, loss, name and guess
        if iter % print_every == 0:
            guess, guess_i = categoryFromOutput(output)
            correct = '✓' if guess == category else '✗ (%s)' % category
            print('%d %d%% (%s) %.4f %s / %s %s' % (iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct))
    
        # Add current loss avg to list of losses
        if iter % plot_every == 0:
            all_losses.append(current_loss / plot_every)
            current_loss = 0
    
    5000 5% (0m 29s) 2.8270 Kakutama / Italian ✗ (Japanese)
    10000 10% (1m 1s) 2.2744 Delgado / Japanese ✗ (Portuguese)
    15000 15% (1m 29s) 3.3385 Kaneko / Portuguese ✗ (Japanese)
    20000 20% (1m 57s) 4.2050 Seto / Spanish ✗ (Chinese)
    25000 25% (2m 25s) 0.9056 Hui / Chinese ✓
    30000 30% (2m 54s) 2.2105 Kool / German ✗ (Dutch)
    35000 35% (3m 23s) 2.6755 Lupo / Portuguese ✗ (Italian)
    40000 40% (3m 52s) 0.0967 Alvarez / Spanish ✓
    45000 45% (4m 20s) 0.1633 Hanraets / Dutch ✓
    50000 50% (4m 47s) 0.6894 Dickson / Scottish ✓
    55000 55% (5m 15s) 1.4523 Delgado / Spanish ✗ (Portuguese)
    60000 60% (5m 42s) 0.5448 Pei / Chinese ✓
    65000 65% (6m 10s) 3.9038 Dannenberg / German ✗ (Russian)
    70000 70% (6m 41s) 1.1801 Kelly / Scottish ✗ (Irish)
    75000 75% (7m 11s) 0.6070 Gajos / Polish ✓
    80000 80% (7m 38s) 0.1557 Beloshitsky / Russian ✓
    85000 85% (8m 8s) 3.6895 Agilera / Spanish ✗ (Russian)
    90000 90% (8m 36s) 0.0096 Shammas / Arabic ✓
    95000 95% (9m 2s) 0.0291 Magalhaes / Portuguese ✓
    100000 100% (9m 31s) 0.0076 Haritopoulos / Greek ✓
    

    1.4.3 Plotting the Results


    Plotting the historical loss from all_losses shows the network
    learning:

    import matplotlib.pyplot as plt
    plt.plot(all_losses)
    plt.show()
    
    
    image

    1.5 Evaluating the Results

    To see how well the network performs on different categories, we will
    create a confusion matrix, indicating for every actual language (rows)
    which language the network guesses (columns). To calculate the confusion
    matrix a bunch of samples are run through the network with
    evaluate(), which is the same as train() minus the backprop.

    # Keep track of correct guesses in a confusion matrix
    confusion_training   = torch.zeros(n_categories, n_categories)
    confusion_validation = torch.zeros(n_categories, n_categories)
    n_confusion = 5000
    
    # Just return an output given a line
    def evaluate(line_tensor):
        rnn.eval()
        output = rnn(line_tensor)
        return output
    
    # Go through a bunch of examples and record which are correctly guessed
    for i in range(n_confusion):
        category, line, category_tensor, line_tensor = randomTrainingExample()
        output = evaluate(line_tensor)
        guess, guess_i = categoryFromOutput(output)
        category_i = all_categories.index(category)
        confusion_training[category_i][guess_i] += 1
    
        
    # Go through a bunch of examples and record which are correctly guessed
    for i in range(n_confusion):
        category, line, category_tensor, line_tensor = randomValidationExample()
        output = evaluate(line_tensor)
        guess, guess_i = categoryFromOutput(output)
        category_i = all_categories.index(category)
        confusion_validation[category_i][guess_i] += 1
        
        
    # catcul acc
    right_train = 0
    right_valid = 0
    for i in range(n_categories):
        right_train += confusion_training[i][i]
        right_valid += confusion_validation[i][i]
    acc_train = right_train / n_confusion
    acc_valid = right_valid / n_confusion
    
    # Normalize by dividing every row by its sum and 
    for i in range(n_categories):
        confusion_training[i] = confusion_training[i] / confusion_training[i].sum()
        confusion_validation[i] = confusion_validation[i] / confusion_validation[i].sum()
    
    
    # Set up plot
    fig = plt.figure()
    ax1 = fig.add_subplot(121)
    cax1 = ax1.matshow(confusion_training.numpy())
    
    ax2 = fig.add_subplot(122)
    cax2 = ax2.matshow(confusion_validation.numpy())
    
    
    # Set up axes
    ax1.set_xticklabels([''] + all_categories, rotation=90)
    ax1.set_yticklabels([''] + all_categories)
    ax2.set_xticklabels([''] + all_categories, rotation=90)
    
    # sphinx_gallery_thumbnail_number = 2
    plt.show()
    
    print("Traing set Acc is", acc_train.item())
    print("validation set Acc is", acc_valid.item())
    
    image
    Traing set Acc is 0.732200026512146
    validation set Acc is 0.44699999690055847
    

    You can pick out bright spots off the main axis that show which
    languages it guesses incorrectly, e.g. Chinese for Korean, and Spanish
    for Italian. It seems to do very well with Greek, and very poorly with
    English (perhaps because of overlap with other languages).

    1.6 Running on User Input

    def predict(input_line, n_predictions=3):
        print('\n> %s' % input_line)
        with torch.no_grad():
            output = evaluate(lineToTensor(input_line))
            output = torch.nn.functional.softmax(output, dim=1)
    
            # Get top N categories
            topv, topi = output.topk(n_predictions, 1, True)
            predictions = []
    
            for i in range(n_predictions):
                value = topv[0][i].item()
                category_index = topi[0][i].item()
                print('Probability (%.2f) %s' % (value, all_categories[category_index]))
                predictions.append([value, all_categories[category_index]])
    
    predict('Dovesky')
    predict('Jackson')
    predict('Satoshi')
    predict("Cui")
    predict("Zhuang")
    predict("Xue")
    predict("Wang")
    predict("Liu")
    predict("Shuo")
    predict("Nino")
    predict("Lau")
    
    > Dovesky
    Probability (0.35) Czech
    Probability (0.29) English
    Probability (0.14) Greek
    
    > Jackson
    Probability (0.99) Scottish
    Probability (0.01) English
    Probability (0.00) French
    
    > Satoshi
    Probability (0.63) Italian
    Probability (0.30) Japanese
    Probability (0.06) Arabic
    
    > Cui
    Probability (0.83) Vietnamese
    Probability (0.14) Chinese
    Probability (0.01) Japanese
    
    > Zhuang
    Probability (0.80) Vietnamese
    Probability (0.19) Chinese
    Probability (0.00) Korean
    
    > Xue
    Probability (0.93) Chinese
    Probability (0.02) German
    Probability (0.01) Korean
    
    > Wang
    Probability (0.87) Chinese
    Probability (0.05) Korean
    Probability (0.03) Vietnamese
    
    > Liu
    Probability (0.44) Chinese
    Probability (0.36) Vietnamese
    Probability (0.18) Polish
    
    > Shuo
    Probability (0.97) Chinese
    Probability (0.01) Japanese
    Probability (0.01) Vietnamese
    
    > Nino
    Probability (0.52) Spanish
    Probability (0.35) Italian
    Probability (0.07) Portuguese
    
    > Lau
    Probability (0.98) Vietnamese
    Probability (0.02) Chinese
    Probability (0.00) German
    

    1.7 Exercises

    ==============================================================================================================

    1、在第1.4.3节中编写代码,打印出训练过程当中,loss值的变化,(loss已经在训练过程中记录,找出并画成曲线即可,你也可以选择自己修改训练过程中的代码,自己实现重新画成更加漂亮的曲线。)

    2、在第1.6节,将自己的姓名的每一个字的拼音输入进去,进行预测,看模型预测是否正确(保留上面的预测结果)。如名字为张三,则输入predict('Zhang')predict('San').

    3、编辑该cell,在下面一行回答问题。当前的BaseRNN中使用的激活函数修改成relu(在第1.3小节中),然后重新训练模型,看性能是否有变化。

    当前的激活函数是( Tanh ),训练集准确率为:( 0.732200026512146 )验证集的准确率为:( 0.35659543275654894 )
    激活函数为Relu时的训练准确率为:( 0.7541999887751331 )验证集的准确率为:( 0.41819998666409793 )

    4、编辑该cell,在下面一行记录当前的模型的准确率,然后,回到1.3中,编辑DeeperRNN类,使其比BaseRNN类更深一层,并且重新训练这个更深的网络,看新能是否有提高(记得去掉DeeperRNN那一个cell中的最后两行的注释)。对于RNN的隐藏层的增加可阅读下一个cell的补充理解

    BaseRNN的训练集准确率为:( 0.732200026512146 )验证集的准确率为:( 0.44699999690055847 )
    DeeperRNN的训练准确率为:( 0.823405736449854 )验证集的准确率为:( 0.4836000054871004 )

    补充材料:

    RNNs are neural networks and everything works monotonically better (if done right) if you put on your deep learning hat and start stacking models up like pancakes. For instance, we can form a 2-layer recurrent network as follows:

    y1 = rnn1.step(x)
    y = rnn2.step(y1)
    

    In other words we have two separate RNNs: One RNN is receiving the input vectors and the second RNN is receiving the output of the first RNN as its input. Except neither of these RNNs know or care - it’s all just vectors coming in and going out, and some gradients flowing through each module during backpropagation.

    原文阅读点击这里

    多层hidden网络结构图如下图所示,其中虚线输出的output表示改output值被丢弃,没有进一步使用:

    image

    2. Generating Names with a Character-Level RNN


    We are still hand-crafting a small RNN with a few linear layers. The big
    difference is instead of predicting a category after reading in all the
    letters of a name, we input a category and output one letter at a time.
    Recurrently predicting characters to form language (this could also be
    done with words or other higher order constructs) is often referred to
    as a "language model".

    2.1 Creating the Network

    This network extends the last tutorial's RNN
    with an extra argument for the category tensor, which is added
    with the others. The category tensor is a one-hot vector just like
    the letter input.

    We will interpret the output as the probability of the next letter. When
    sampling, the most likely output letter is used as the next input
    letter.

    I added a second linear layer h2o (after combining hidden and
    output) to give it more muscle to work with. There's also a dropout
    layer, which randomly zeros parts of its
    input
    with a given probability
    (here 0.2) and is usually used to fuzz inputs to prevent overfitting.
    Here we're using it towards the end of the network to purposely add some
    chaos and increase sampling variety.

    image
    import torch
    import torch.nn as nn
    
    class GenerateRNN(nn.Module):
        def __init__(self, input_size, hidden_size, output_size):
            super(GenerateRNN, self).__init__()
            self.hidden_size = hidden_size
            self.c2h = nn.Linear(n_categories, hidden_size)
            self.i2h = nn.Linear(input_size,   hidden_size)
            self.h2h = nn.Linear(hidden_size,  hidden_size)
            self.activation = nn.Tanh()
            self.h2o = nn.Linear(hidden_size,  output_size)
            self.dropout = nn.Dropout(0.2)
    
            self.softmax = nn.LogSoftmax(dim=1)
    
        def forward(self, category, input, hidden):
            c2h = self.c2h(category)
            i2h = self.i2h(input)
            h2h = self.h2h(hidden)
            
            hidden = self.activation( c2h+i2h+h2h )
            
    
            dropout = self.dropout(self.h2o(hidden))
            output = self.softmax(dropout)
            return output, hidden
    
        def initHidden(self, is_cuda=True):
            if is_cuda:
                return torch.zeros(1, self.hidden_size).cuda()
            else:
                return torch.zeros(1, self.hidden_size)
    

    2.2 Training

    2.2.1 Preparing for Training

    import random
    
    # Random item from a list
    def randomChoice(l):
        return l[random.randint(0, len(l) - 1)]
    
    # Get a random category and random line from that category
    def randomTrainingPair():
        category = randomChoice(all_categories)
        line = randomChoice(category_lines[category])
        return category, line
    

    For each timestep (that is, for each letter in a training word) the
    inputs of the network will be
    (category, current letter, hidden state) and the outputs will be
    (next letter, next hidden state). So for each training set, we'll
    need the category, a set of input letters, and a set of output/target
    letters.

    Since we are predicting the next letter from the current letter for each
    timestep, the letter pairs are groups of consecutive letters from the
    line - e.g. for "ABCD<EOS>" we would create ("A", "B"), ("B", "C"),
    ("C", "D"), ("D", "EOS").

    [图片上传失败...(image-c7761d-1564709850779)]

    The category tensor is a one-hot
    tensor
    of size
    <1 x n_categories>. When training we feed it to the network at every
    timestep - this is a design choice, it could have been included as part
    of initial hidden state or some other strategy.

    Or you could look at here for another description. I think this image could
    tell you everything.

    image
    # One-hot vector for category
    def categoryTensor(category):
        li = all_categories.index(category)
        tensor = torch.zeros(1, n_categories)
        tensor[0][li] = 1
        if is_cuda:
            tensor = tensor.cuda()
        return tensor
    
    # One-hot matrix of first to last letters (not including EOS) for input
    def inputTensor(line):
        tensor = torch.zeros(len(line), 1, n_letters)
        for li in range(len(line)):
            letter = line[li]
            tensor[li][0][all_letters.find(letter)] = 1
        if is_cuda:
            tensor = tensor.cuda()
        return tensor
    
    # LongTensor of second letter to end (EOS) for target
    def targetTensor(line):
        letter_indexes = [all_letters.find(line[li]) for li in range(1, len(line))]
        letter_indexes.append(n_letters - 1) # EOS
        tensor = torch.LongTensor(letter_indexes)
        if is_cuda:
            tensor = tensor.cuda()
        return tensor
    

    For convenience during training we'll make a randomTrainingExample
    function that fetches a random (category, line) pair and turns them into
    the required (category, input, target) tensors.

    # Make category, input, and target tensors from a random category, line pair
    def randomTrainingExample():
        category, line = randomTrainingPair()
        category_tensor = categoryTensor(category)
        input_line_tensor = inputTensor(line)
        target_line_tensor = targetTensor(line)
        return category_tensor, input_line_tensor, target_line_tensor
    

    2.2.2 Training the Network

    In contrast to classification, where only the last output is used, we
    are making a prediction at every step, so we are calculating loss at
    every step.

    The magic of autograd allows you to simply sum these losses at each step
    and call backward at the end.

    For the loss function nn.NLLLoss is appropriate, since the last
    layer of the RNN is nn.LogSoftmax.

    For the different between nn.NLLLoss and the nn.LogSoftmax, we could Look
    at the source code, the most import line is quote as follow:

     def cross_entropy(input, target):
        return nll_loss(log_softmax(input, 1))
    
    

    In a word, Cross entropy combines log_softmax and nll_loss in a single function.

    criterion = nn.NLLLoss()
    
    learning_rate = 0.0005
    
    def train(category_tensor, input_line_tensor, target_line_tensor):
        target_line_tensor.unsqueeze_(-1)
        hidden = rnn.initHidden()
    
        rnn.zero_grad()
    
        loss = 0
    
        # Take care of the loss function, 
        # it could be visualized as the following figure 
        for i in range(input_line_tensor.size(0)):
            output, hidden = rnn(category_tensor, input_line_tensor[i], hidden)
            l = criterion(output, target_line_tensor[i])
            loss += l
    
        loss.backward()
    
        for p in rnn.parameters():
            if hasattr(p.grad, "data"):
                p.data.add_(-learning_rate, p.grad.data)
    
        return output, loss.item() / input_line_tensor.size(0)
    

    The loss function could be shown as the following picture. The network forward through entire sequence to compute loss, then backward through entire sequence to compute gradient


    image

    To keep track of how long training takes I am adding a timeSince(timestamp) function which returns a human readable string:

    import time
    import math
    
    def timeSince(since):
        now = time.time()
        s = now - since
        m = math.floor(s / 60)
        s -= m * 60
        return '%dm %ds' % (m, s)
    

    Training is business as usual - call train a bunch of times and wait a few minutes, printing the current time and loss every print_every examples, and keeping store of an average loss per plot_every examples in all_losses for plotting later.

    is_cuda = True
    rnn = GenerateRNN(n_letters, 128, n_letters)
    rnn = rnn.cuda() if is_cuda else rnn
    
    n_iters = 100000
    print_every = 5000
    plot_every = 500
    all_losses = []
    total_loss = 0 # Reset every plot_every iters
    
    start = time.time()
    
    for iter in range(1, n_iters + 1):
        output, loss = train(*randomTrainingExample())
        total_loss += loss
    
        if iter % print_every == 0:
            print('%s (%d %d%%) %.4f' % (timeSince(start), iter, iter / n_iters * 100, total_loss/iter))
    
        if iter % plot_every == 0:
            all_losses.append(total_loss / iter)
    
    0m 44s (5000 5%) 3.1420
    1m 29s (10000 10%) 3.0273
    2m 13s (15000 15%) 2.9634
    2m 57s (20000 20%) 2.9156
    3m 41s (25000 25%) 2.8781
    4m 26s (30000 30%) 2.8456
    5m 10s (35000 35%) 2.8196
    5m 54s (40000 40%) 2.7973
    6m 38s (45000 45%) 2.7777
    7m 22s (50000 50%) 2.7586
    8m 6s (55000 55%) 2.7420
    8m 49s (60000 60%) 2.7266
    9m 33s (65000 65%) 2.7130
    10m 18s (70000 70%) 2.6991
    11m 3s (75000 75%) 2.6872
    11m 49s (80000 80%) 2.6760
    12m 34s (85000 85%) 2.6660
    13m 19s (90000 90%) 2.6557
    14m 5s (95000 95%) 2.6466
    14m 50s (100000 100%) 2.6371
    

    2.2.3 Plotting the Losses

    Plotting the historical loss from all_losses shows the network
    learning:

    import matplotlib.pyplot as plt
    import matplotlib.ticker as ticker
    
    plt.figure()
    plt.plot(all_losses)
    
    [<matplotlib.lines.Line2D at 0x7f34b1b3f9e8>]
    
    image

    2.3 Sampling the Network

    To sample we give the network a letter and ask what the next one is,
    feed that in as the next letter, and repeat until the EOS token.

    • Create tensors for input category, starting letter, and empty hidden
      state

    • Create a string output_name with the starting letter

    • Up to a maximum output length,

      • Feed the current letter to the network
      • Get the next letter from highest output, and next hidden state
      • If the letter is EOS, stop here
      • If a regular letter, add to output_name and continue
    • Return the final name

    Note:
    Rather than having to give it a starting letter, another
    strategy would have been to include a "start of string" token in
    training and have the network choose its own starting letter.

    max_length = 20
    
    # Sample from a category and starting letter
    def sample(category, start_letter='A'):
        with torch.no_grad():  
            # no need to track history in sampling
            category_tensor = categoryTensor(category)
            input = inputTensor(start_letter)
            hidden = rnn.initHidden()
    
            output_name = start_letter
    
            for i in range(max_length):
                output, hidden = rnn(category_tensor, input[0], hidden)
                topv, topi = output.topk(1)
                topi = topi[0][0]
                if topi == n_letters - 1:
                    break
                else:
                    letter = all_letters[topi]
                    output_name += letter
                input = inputTensor(letter)
    
            return output_name
    
    # Get multiple samples from one category and multiple starting letters
    def samples(category, start_letters='ABC'):
        for start_letter in start_letters:
            print(sample(category, start_letter))
        print("\n")
    
    
    samples('Russian', 'CCZZYY')
    
    samples('German',  'CCZZYY')
    
    samples('Spanish', 'CCZZYY')
    
    samples('Chinese', 'CCZZYY')
    
    samples('Chinese', 'LS')
    
    samples('Chinese', 'LS')
    
    Chanhin
    Chanhov
    Zhinhovov
    Zantov
    Yakinov
    Yoverin
    


    ​ Cherrer
    ​ Chaner
    ​ Zerter
    ​ Zentener
    ​ Yerten
    ​ Yenter


    ​ Coraner
    ​ Calla
    ​ Zoner
    ​ Zoner
    ​ Yanera
    ​ Yaner


    ​ Cau
    ​ Cha
    ​ Zhang
    ​ Zung
    ​ Yan
    ​ Yan


    ​ Lia
    ​ Sin


    ​ Lan
    ​ Sin

    2.4 Exercises

    =======================================================================================================

    1、反复输入自己名字的首字母,观察网络的生成的名字是否一样(保留以你自己的名字的首字母为程序输入的结果)
    No.

    2、请回答:为什么这个模型这个模型训练好了之后,输入同样的参数会产生不一样的结果?并在上面的程序中验证你自己的猜测
    Because in this RNN model, the hidden state of the model is changed every time it is entered and run.

    3、这是一个生成模型,该模型只输入了一个字母,就可以预测一个单词,请问改模型在开始预测什么时候终止预测新的字符,即这个生成模型是如何确定生成的单词的长度的?

    Predict when it is digital position is n_letters - 1, the symbol ' were predicted to terminate the new characters.

    4、根据下一个cell的提示,以impantance sampling的方式,实现程序在同样输入的情况下,可能生成不一样的名字。

    As follows.

    Importance sampling指的是,在生成单词每一个字母的时候,一般的生成模型都有是选择输出结果中概率值最大的字母作为预测,而Importance sampling的方式则是按照各个字母的输出概率来选择输出哪个模型。举个栗子,当程序输入一个字母C的时候,程序预测下一个字母为[a, o, e]的概率分别为[0.7, 0.2, 0.1], 普通的做法是直接选择概率值最大的a作为预测结果,而Importance Sampling的方式则是以70%的概率选择a,以20%的概率选择o,以10%的该路选择e,最终选到哪个有一定的随机性。
    请根据你的理解,补充完成下面的Importance sampling的预测。

    Tips: 可使用numpy.random.choice函数,文档请点击这里

    Tips: 这部分的代码与上面的代码的不同之处仅仅在于如何选取预测值,所以,你只需要在下面指定的区域编辑代码即可。

    Tips: 直接使用模型输出的概率值会比较麻烦,因为模型的概率输出值会很小,因为这里有50多种字符,所以,一个更加明智的做法是只在概率值最高的前3个或者5个中选择,记得要将概率使用softmax等函数将其转换到和为1.

    Tips: 更加简单的做法是,仅仅使用模型输出的概率值进行排序,然后使用指定的概率值进行选择,如,指定选择概率最大的那个字母的概率值为0.5,第二大的为0.3,第三大的为0.2,然后可以直接输入[0.5, 0.3, 0.2].

    import numpy as np
    max_length = 20
    def softmax(x):
        """Compute softmax values for each sets of scores in x."""
        return np.exp(x) / np.sum(np.exp(x), axis=0)
    
    def sample(category, start_letter='A'):
        with torch.no_grad():  
            # no need to track history in sampling
            category_tensor = categoryTensor(category)
            input = inputTensor(start_letter)
            output_name = start_letter
            hidden = rnn.initHidden()
            for i in range(max_length):
                output, hidden = rnn(category_tensor, input[0], hidden)
                prob_list = output.tolist()[0]
                topv, topi = output.topk(5)
                prob_list = softmax([prob_list[i] for i in topi[0]])
                index = np.random.choice([i for i in range(len(prob_list))], p=prob_list)
                topi = topi[0][index]
                if topi == n_letters - 1:
                    break
                else:
                    letter = all_letters[topi]
                    output_name += letter
                input = inputTensor(letter)
    
            return output_name
    
    # Get multiple samples from one category and multiple starting letters
    def samples(category, start_letters='ABC'):
        for start_letter in start_letters:
            print(sample(category, start_letter))
        print("\n")
    
    rnn.eval()
    
    samples('Russian', 'CYY')
    
    samples('German', 'CCY')
    
    samples('Spanish', 'CCZZYY')
    
    samples('Chinese', 'CCCZZZYYY')
    
    samples('Chinese', 'LS')
    
    samples('Chinese', 'LS')
    
    Chovin
    Yinkiven
    Yantev
    


    ​ Counerre
    ​ Chenentrri
    ​ Yuner


    ​ Cheilleni
    ​ Cherer
    ​ Zollus
    ​ Zouna
    ​ Yanterara
    ​ Yenseraz


    ​ Chen
    ​ Chening
    ​ Cho
    ​ Zin
    ​ Zia
    ​ Ziang
    ​ Yaing
    ​ You
    ​ Ying


    ​ Lenin
    ​ Sha


    ​ Ling
    ​ Shan

    
    

    相关文章

      网友评论

          本文标题:深度学习笔记(十六)—— RNN-1

          本文链接:https://www.haomeiwen.com/subject/yiskdctx.html