视频:https://www.bilibili.com/video/BV1Rv411y7oE?p=65
源码:https://github.com/dragen1860/Deep-Learning-with-PyTorch-Tutorials
单层RNN
单层RNN原理(未展开) 梯度求导过程 单层 hidden_layer 展开后的RNN各个向量维度问题
处理单个seq
seqlen
:序列长度
seqnum
:多少个序列同时训练,也就是batch
dim
:序列中一个数据的维度
hidden
:hiddenl_layer的维度
num_layers
:RNN有多少hidden层
x:[1,seqlen, dim]
ht:[1,num_layers,hidden] #注意ht是最后一个memory
output:[1,seqlen,hidden] # output是每个memory最后的结果
y:[1,seqlen,dim]
如果同时处理多个seq序列:
batch也就是seqnum,也就是进行minibatch训练
x:[b,seqlen, dim]
ht:[b,dim,hidden]
output:[b,seqlen,hidden]
y:[b,seqlen,dim]
2层hidden_layer
多层hidden layer
梯度弥散和梯度爆炸
RNN梯度求导公式 RNN层数过深会导致的梯度弥散和爆炸的原因拟合正弦曲线
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from matplotlib import pyplot as plt
np.random.seed(42)
num_time_steps = 50
input_size = 1
hidden_size = 16
output_size = 1
lr = 0.01
class Net(nn.Module):
def __init__(self, ):
super(Net, self).__init__()
self.rnn = nn.RNN(
input_size=input_size,
hidden_size=hidden_size,
num_layers=1,
batch_first=True,
)
for p in self.rnn.parameters():
nn.init.normal_(p, mean=0.0, std=0.001)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x, hidden_prev):
out, hidden_prev = self.rnn(x, hidden_prev)
out = self.linear(out)
return out, hidden_prev
model = Net()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr)
hidden_prev = torch.zeros(1, 1, hidden_size)
for iter in range(5000):
start = np.random.randint(3)
time_steps = np.linspace(start, start + 10, num_time_steps)
data = np.sin(time_steps)
data = data.reshape(num_time_steps, 1)
x = torch.tensor(data[:-1]).float().view(1, num_time_steps - 1, 1)
y = torch.tensor(data[1:]).float().view(1, num_time_steps - 1, 1)
output, hidden_prev = model(x, hidden_prev)
hidden_prev = hidden_prev.detach()
loss = criterion(output, y)
model.zero_grad()
loss.backward() # 计算出 weight.grad
for p in model.parameters():
# print(p.grad.norm())
torch.nn.utils.clip_grad_norm_(p, 10) # 防止梯度爆炸
optimizer.step()
if iter % 100 == 0:
print("Iteration: {} loss {}".format(iter, loss.item()))
start = np.random.randint(3)
time_steps = np.linspace(start, start + 10, num_time_steps)
data = np.sin(time_steps)
data = data.reshape(num_time_steps, 1)
x = torch.tensor(data[:-1]).float().view(1, num_time_steps - 1, 1)
y = torch.tensor(data[1:]).float().view(1, num_time_steps - 1, 1)
predictions = []
input = x[:, 0, :]
for _ in range(x.shape[1]):
input = input.view(1, 1, 1)
(pred, hidden_prev) = model(input, hidden_prev)
input = pred
predictions.append(pred.detach().numpy().ravel()[0])
# ground-truth 理想曲线
plt.scatter(time_steps, np.sin(time_steps), s=90)
plt.plot(time_steps, np.sin(time_steps))
# prediction
plt.scatter(time_steps[1:], predictions)
plt.show()
网友评论