RNN

作者: 来到了没有知识的荒原 | 来源:发表于2020-12-02 16:33 被阅读0次

    视频:https://www.bilibili.com/video/BV1Rv411y7oE?p=65
    源码:https://github.com/dragen1860/Deep-Learning-with-PyTorch-Tutorials

    单层RNN

    单层RNN原理(未展开) 梯度求导过程 单层 hidden_layer 展开后的RNN

    各个向量维度问题

    处理单个seq

    seqlen:序列长度
    seqnum:多少个序列同时训练,也就是batch
    dim:序列中一个数据的维度
    hidden:hiddenl_layer的维度
    num_layers:RNN有多少hidden层

    x:[1,seqlen, dim]
    ht:[1,num_layers,hidden] #注意ht是最后一个memory
    output:[1,seqlen,hidden] # output是每个memory最后的结果
    y:[1,seqlen,dim]
    

    如果同时处理多个seq序列:

    batch也就是seqnum,也就是进行minibatch训练

    x:[b,seqlen, dim]
    ht:[b,dim,hidden]
    output:[b,seqlen,hidden]
    y:[b,seqlen,dim]
    
    2层hidden_layer

    多层hidden layer

    梯度弥散和梯度爆炸

    RNN梯度求导公式 RNN层数过深会导致的梯度弥散和爆炸的原因

    拟合正弦曲线

    import numpy as np
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from matplotlib import pyplot as plt
    
    np.random.seed(42)
    num_time_steps = 50
    input_size = 1
    hidden_size = 16
    output_size = 1
    lr = 0.01
    
    
    class Net(nn.Module):
        def __init__(self, ):
            super(Net, self).__init__()
    
            self.rnn = nn.RNN(
                input_size=input_size,
                hidden_size=hidden_size,
                num_layers=1,
                batch_first=True,
            )
            for p in self.rnn.parameters():
                nn.init.normal_(p, mean=0.0, std=0.001)
    
            self.linear = nn.Linear(hidden_size, output_size)
    
        def forward(self, x, hidden_prev):
            out, hidden_prev = self.rnn(x, hidden_prev)
            out = self.linear(out)
            return out, hidden_prev
    
    
    model = Net()
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr)
    
    hidden_prev = torch.zeros(1, 1, hidden_size)
    
    for iter in range(5000):
        start = np.random.randint(3)
        time_steps = np.linspace(start, start + 10, num_time_steps)
        data = np.sin(time_steps)
        data = data.reshape(num_time_steps, 1)
        x = torch.tensor(data[:-1]).float().view(1, num_time_steps - 1, 1)
        y = torch.tensor(data[1:]).float().view(1, num_time_steps - 1, 1)
    
        output, hidden_prev = model(x, hidden_prev)
        hidden_prev = hidden_prev.detach()
    
        loss = criterion(output, y)
        model.zero_grad()
        loss.backward()  # 计算出 weight.grad
        for p in model.parameters():
            # print(p.grad.norm())
            torch.nn.utils.clip_grad_norm_(p, 10)  # 防止梯度爆炸
        optimizer.step()
    
        if iter % 100 == 0:
            print("Iteration: {} loss {}".format(iter, loss.item()))
    
    start = np.random.randint(3)
    time_steps = np.linspace(start, start + 10, num_time_steps)
    data = np.sin(time_steps)
    data = data.reshape(num_time_steps, 1)
    x = torch.tensor(data[:-1]).float().view(1, num_time_steps - 1, 1)
    y = torch.tensor(data[1:]).float().view(1, num_time_steps - 1, 1)
    
    predictions = []
    input = x[:, 0, :]
    for _ in range(x.shape[1]):
        input = input.view(1, 1, 1)
        (pred, hidden_prev) = model(input, hidden_prev)
        input = pred
        predictions.append(pred.detach().numpy().ravel()[0])
    
    # ground-truth 理想曲线
    plt.scatter(time_steps, np.sin(time_steps), s=90)
    plt.plot(time_steps, np.sin(time_steps))
    
    # prediction
    plt.scatter(time_steps[1:], predictions)
    plt.show()
    
    

    相关文章

      网友评论

          本文标题:RNN

          本文链接:https://www.haomeiwen.com/subject/qjluwktx.html