推理和验证
在训练神经网络之后,可以使用它来进行预测。这个过程通常称为推理过程,这一术语来自统计学。然而,神经网络在面对训练数据时往往表现得太过优异,因而无法泛化到未见过的数据。这称之为过拟合,会影响推理效果。为了在训练中测试过拟合情况,我们会使用非训练集中的数据(称为验证集)衡量效果。在训练期间监测验证效果时,我们使用正则化避免过拟合。
测试集包含和训练集相似的图像。通常,我们会将原始数据集的 10-20% 作为测试和验证集,剩下的用于训练。
验证的目的是衡量模型在非训练集数据上的效果。效果标准由开发者自己决定。通常用准确率表示,即网络预测正确的类别所占百分比。其他标准包括精确率和召回率以及top-5 错误率。我们将侧重于准确率。首先,将使用测试集中的一批数据进行前向传播。
一、 过拟合
如果我们观察训练过程中的训练和验证损失,就能发现一种叫做过拟合的现象。
overfitting.png
网络能越来越好地学习训练集中的规律,导致训练损失越来越低。但是,它在泛化到训练集之外的数据时开始出现问题,导致验证损失上升。任何深度学习模型的最终目标是对新数据进行预测,因此我们要尽量降低验证损失。一种方法是使用验证损失最低
的模型,在此例中是训练周期约为 8-10 次的模型。这种策略称为早停法 (early stopping)。在实践中,你需要在训练时频繁地保存模型,以便之后选择验证损失最低的模型
。
最常用的减少过拟合方法(早停法除外)是丢弃,即随机丢弃输入单元。这样就促使网络在权重之间共享信息,使其更能泛化到新数据。在 PyTorch 中添加丢弃层很简单,使用 nn.Dropout
模块即可。
The network learns the training set better and better, resulting in lower training losses. However, it starts having problems generalizing to data outside the training set leading to the validation loss increasing. The ultimate goal of any deep learning model is to make predictions on new data, so we should strive to get the lowest validation loss possible. One option is to use the version of the model with the lowest validation loss, here the one around 8-10 training epochs. This strategy is called early-stopping. In practice, you'd save the model frequently as you're training then later choose the model with the lowest validation loss.
The most common method to reduce overfitting (outside of early-stopping) is dropout, where we randomly drop input units. This forces the network to share information between weights, increasing it's ability to generalize to new data. Adding dropout in PyTorch is straightforward using the nn.Dropout
module.
在训练过程中,我们需要使用丢弃防止过拟合,但是在推理过程中,我们需要使用整个网络。因此在验证、测试和使用网络进行预测时,我们需要关闭丢弃功能。你可以使用 model.eval()
。它会将模型设为验证模式,使丢弃率变成 0。也可以使用 model.train()
,将模型设为训练模式,重新开启丢弃功能。通常,验证循环的规律将为:关闭梯度,将模型设为评估模式,计算验证损失和指标,然后将模型重新设为训练模式。
# turn off gradients
with torch.no_grad():
# set model to evaluation mode
model.eval()
# validation pass here
for images, labels in testloader:
...
# set model back to train mode
model.train()
二、 推理
训练好模型后,我们可以用它推理了。之前已经进行过这一步骤,但是现在需要使用 model.eval()
将模型设为推理模式。对于 torch.no_grad()
,你需要关闭 autograd。
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt
import torch
from torchvision import datasets, transforms
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
# Download and load the test data
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)
from torch import nn, optim
import torch.nn.functional as F
## Define your model with dropout added
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)
#Dropout module with 0.2 drop probability
self.dropout = nn.Dropout(p = 0.2)
def forward(self, x):
x = x.view(x.shape[0], -1)
#with dropout
x = self.dropout(F.relu(self.fc1(x)))
x = self.dropout(F.relu(self.fc2(x)))
x = self.dropout(F.relu(self.fc3(x)))
x = F.log_softmax(self.fc4(x), dim=1)
return x
##Train your model with dropout, and monitor the training progress with the validation loss and accuracy
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
epochs = 30
steps = 0
train_losses, test_losses = [],[]
for e in range(epochs):
running_loss = 0
model.train()
for images, labels in trainloader:
optimizer.zero_grad()
log_ps = model(images)
loss = criterion(log_ps, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
else:
test_loss = 0
accuracy = 0
with torch.no_grad():
model.eval()
for images, labels in testloader:
log_ps = model(images)
loss = criterion(log_ps, labels)
test_loss += loss
ps = torch.exp(log_ps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))
train_losses.append(running_loss/len(trainloader))
test_losses.append(test_loss/len(testloader))
print("Epoch: {}/{}..".format(e+1, epochs),
"Training Loss: {:.3f}..".format(running_loss/len(trainloader)),
"Test Loss: {:.3f}..".format(test_loss/len(testloader)),
"Test Accuracy: {:.3f}".format(accuracy/len(testloader))
)
plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)
# Import helper module (should be in the repo)
import helper
# Test out your network!
model.eval()
dataiter = iter(testloader)
images, labels = dataiter.next()
img = images[0]
# Convert 2D image to 1D vector
img = img.view(1, 784)
# Calculate the class probabilities (softmax) for img
with torch.no_grad():
output = model.forward(img)
ps = torch.exp(output)
# Plot the image and probabilities
helper.view_classify(img.view(1, 28, 28), ps, version='Fashion')
网友评论