Central to all neural networks in PyTorch is the autograd package. Let’s first briefly visit this, and we
will then go to training our first neural network.
The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-
run framework, which means that your backprop is defined by how your code is run, and that every single
iteration can be different.
1.1 Create a grad tracked tensor
torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.
define a function:
x = [[1,1],[1,1]]
y = x + 2
z = y ^ 2 * 3
out = z.mean()
import torch
# create a tensor with setting its .requires_grad as Ture
x = torch.ones(2, 2, requires_grad=True)
x1 = torch.ones(2,2,requires_grad=False)
# x1.requires_grad_(True)
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
tensor([[1., 1.],
[1., 1.]])
1.2 Do a tensor operation
y = x + 2
y1 = x1 + 2
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
tensor([[3., 3.],
[3., 3.]])
y was created as a result of an operation, so it has a grad_fn.But y1 not
<AddBackward0 object at 0x1c2a73f908>
1.3 More operations on y
z = y * y * 3
z1 = y1 * y1 * 3
out = z.mean() #calculate z average value
out1 = z1.mean() #calculate z1 average value
print(z, out)
print(z1, out1)
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward1>)
tensor([[27., 27.],
[27., 27.]]) tensor(27.)
.requires_grad_( ) changes an existing Tensor’s requires_grad flag in-place. The input flag defaults to False if not given.
Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of
computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor
(except for Tensors created by the user - their grad_fn is None).
a = torch.randn(2, 2) # a is created by user, its .grad_fn is None
a = ((a * 3) / (a - 1))
a.requires_grad_(True) # change the attribute .grad_fn of a
b = (a * a).sum() # add all elements of a to b
<SumBackward0 object at 0x1c2a759198>
2 Gradients
2.1 Backprop
Because out contains a single scalar, out.backward( )
is equivalent to out.backward(torch.tensor(1.))
# out.backward(torch.tensor(1.))
# out1.backward()
you can get parameters gradient like below:
x_grad = x.grad
y_grad = y.grad
z_grad = z.grad
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
2.2 Jacobian-vector product example
If you want to compute the derivatives, you can call .backward( )
on a Tensor. If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to .backward()
, however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.
define a function:
x = [1, 1, 1]
y = x + [1, 2, 3]
z = y ^ 3
x = torch.ones(3, requires_grad=True)
y = x + torch.tensor([1., 2., 3.])
z = y * y * y
v = torch.tensor([1, 0.1, 0.01])
# z is a vector, so you need to specify a gradient whose size is the same as z
tensor([ 8., 27., 64.], grad_fn=<MulBackward0>)
tensor([12.0000, 2.7000, 0.4800])
2.3 Problem 1
What is the meaning of the in-argument in the .backward()
method? Try different input and answer.
The passed argument is the value of the biased derivative.
Specifically, when obj.backward(val)
Note that normally val=torch.ones()
, and obj.backward()
is equivalent to obj.backward(torch.tensor(1.)).
A typical training procedure for a neural network is as follows:
- Define the neural network that has some learnable parameters (or weights)
- Iterate over a dataset of inputs
- Process input through the network
- Empty the parameters in optimizer
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient
Let’s define a network to classify points of gaussian distribution to three classes.
3.1 Show all points
Show all points (containing trainset and testset) you will use.
# show all points, you can skip this cell
def show_original_points():
label_csv = open('./labels/label.csv', 'r')
label_writer = csv.reader(label_csv)
class1_point = []
class2_point = []
class3_point = []
for item in label_writer:
if item[2] == '0':
class1_point.append([item[0], item[1]])
elif item[2] == '1':
class2_point.append([item[0], item[1]])
class3_point.append([item[0], item[1]])
data1 = np.array(class1_point, dtype=float)
data2 = np.array(class2_point, dtype=float)
data3 = np.array(class3_point, dtype=float)
x1, y1 = data1.T
x2, y2 = data2.T
x3, y3 = data3.T
plt.scatter(x1, y1, c='b', marker='.')
plt.scatter(x2, y2, c='r', marker='.')
plt.scatter(x3, y3, c='g', marker='.')
3.2 Define a network
When you define a network, your class must to inherit nn.Moudle, then you should to overload __init__ method and forward method
(hidden): Linear(in_features=2, out_features=5, bias=True)
(sigmiod): Sigmoid()
(predict): Linear(in_features=5, out_features=3, bias=True)
import numpy as np
import matplotlib.pyplot as plt
import torchvision
import torch
import pandas as pd
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import time
import csv
import numpy as np
class Network(nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
n_feature(int): size of input tensor
n_hidden(int): size of hidden layer
n_output(int): size of output tensor
super(Network, self).__init__()
# define a liner layer
self.hidden = nn.Linear(n_feature, n_hidden)
# define sigmoid activation
self.sigmoid = nn.Sigmoid()
self.predict = nn.Linear(n_hidden, n_output)
def forward(self, x):
x(tensor): inputs of the network
# hidden layer
h1 = self.hidden(x)
# activate function
h2 = self.sigmoid(h1)
# output layer
out = self.predict(h2)
Linear classifier often follows softmax to output probability,
however the loss function CrossEntropy we used have done this
operation, so we don't use softmax function here.
return out
CrossEntropy written in pytorch:
3.3 Overload dataset
Please skip the below cell when you are trying to train a model.
class PointDataset(Dataset):
def __init__(self, csv_file, transform=None):
csv_file(string): path of label file
transform (callable, optional): Optional transform to be applied
on a sample.
self.frame = pd.read_csv(csv_file, encoding='utf-8', header=None)
print('csv_file source ---->', csv_file)
self.transform = transform
def __len__(self):
return len(self.frame)
def __getitem__(self, idx):
x = self.frame.iloc[idx, 0]
y = self.frame.iloc[idx, 1]
point = np.array([x, y])
label = int(self.frame.iloc[idx, 2])
if self.transform is not None:
point = self.transform(point)
sample = {'point': point, 'label': label}
return sample
3.4 Train function
Train a model and show running_loss curve ana show accuracy curve.
def train(classifier_net, trainloader, testloader, device, lr, optimizer):
classifier_net(nn.model): train model
trainloader(torch.utils.data.DateLoader): train loader
testloader(torch.utils.data.DateLoader): test loader
device(torch.device): the evironment your model training
LR(float): learning rate
# loss function
criterion = nn.CrossEntropyLoss().to(device)
optimizer = optimizer
# save the mean value of loss in an epoch
running_loss = []
running_accuracy = []
# count loss in an epoch
temp_loss = 0.0
# count the iteration number in an epoch
iteration = 0
for epoch in range(epoches):
adjust learning rate when you are training the model
# adjust learning rate
# if epoch % 100 == 0 and epoch != 0:
# LR = LR * 0.1
# for param_group in optimizer.param_groups:
# param_group['lr'] = LR
for i, data in enumerate(trainloader):
point, label = data['point'], data['label']
point, label = point.to(device).to(torch.float32), label.to(device)
outputs = classifier_net(point)
'''# TODO'''
loss = criterion(outputs, label)
'''# TODO END'''
# save loss in a list
temp_loss += loss.item()
iteration +=1
# print loss value
# print('[{0:d},{1:5.0f}] loss {2:.5f}'.format(epoch + 1, i, loss.item()))
#slow down speed of print function
# time.sleep(0.5)
running_loss.append(temp_loss / iteration)
temp_loss = 0
iteration = 0
print('test {}:----------------------------------------------------------------'.format(epoch))
# call test function and return accuracy
running_accuracy.append(predict(classifier_net, testloader, device))
# show loss curve
# show accuracy curve
return classifier_net
3.5 Test function
Test the performance of your model
# show accuracy curve, you can skip this cell.
def show_accuracy(running_accuracy):
x = np.array([i for i in range(len(running_accuracy))])
y = np.array(running_accuracy)
plt.plot(x, y, c='b')
plt.title('accuracy curve:')
plt.ylabel('accuracy value')
# show running loss curve, you can skip this cell.
def show_running_loss(running_loss):
# generate x value
x = np.array([i for i in range(len(running_loss))])
# generate y value
y = np.array(running_loss)
# define a graph
# generate curve
plt.plot(x, y, c='b')
# show axis
# define title
plt.title('loss curve:')
#define the name of x axis
plt.ylabel('loss value')
# show graph
def predict(classifier_net, testloader, device):
# correct = [0 for i in range(3)]
# total = [0 for i in range(3)]
correct = 0
total = 0
with torch.no_grad():
you can also stop autograd from tracking history on Tensors with .requires_grad=True
by wrapping the code block in with torch.no_grad():
for data in testloader:
point, label = data['point'], data['label']
point, label = point.to(device).to(torch.float32), label.to(device)
outputs = classifier_net(point)
if you want to get probability of the model prediction,
you can use softmax function here to transform outputs to probability.
# transform the prediction to one-hot form
_, predicted = torch.max(outputs, 1)
print('model prediction: ', predicted)
print('ground truth:', label, '\n')
correct += (predicted == label).sum()
total += label.size(0)
print('current correct is:', correct.item())
print('current total is:', total)
print('the accuracy of the model is {0:5f}'.format(correct.item()/total))
return correct.item() / total
3.6 Main function
if __name__ == '__main__':
change train epoches here
# number of training
epoches = 100
change learning rate here
# learning rate
# 1e-4 = e^-4
lr = 1e-3
change batch size here
# batch size
batch_size = 16
# define a transform to pretreat data
transform = torch.tensor
# define a gpu device
device = torch.device('cpu')
# define a trainset
trainset = PointDataset('./labels/train.csv', transform=transform)
# define a trainloader
trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
# define a testset
testset = PointDataset('./labels/test.csv', transform=transform)
# define a testloader
testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
# define a network
classifier_net = Network(2, 5, 3).to(device)
change optimizer here
# define a optimizer
optimizer = optim.SGD(classifier_net.parameters(), lr=lr, momentum=0.9)
# optimizer = optim.Adam(classifier_net.parameters(), lr=lr)
# optimizer = optim.Rprop(classifier_net.parameters(), lr=lr)
# optimizer = optim.ASGD(classifier_net.parameters(), lr=lr)
# optimizer = optim.Adamax(classifier_net.parameters(), lr=lr)
# optimizer = optim.RMSprop(classifier_net.parameters(), lr=lr)
# get trained model
classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer,)
3.7 Problem 2
Correct the order and fill in the # TODO in the train
cell below with:
# update paraeters in optimizer(update weigtht)
# calcutate loss value
loss = criterion(outputs, label)
# empty parameters in optimizer
# back propagation
Correct order is:
loss = criterion(outputs, label)
3.8 Problem 3
Adjust learning rate and observe the loss and accuracy curves. Illuminate the influence and causes of the learning rate on the loss and accuracy value.
- learning rate = 1e-1 image
- learning rate = 1e-2 image
- learning rate = 1e-3 image
- learning rate = 1e-4 image
- learning rate = 5 * 1e-5 image
- learning rate = 1e-5 image
Influence: It shows that when the learning rate is relatively large, we can reach the convergence in a short time and the loss function fluctuates little, vice versa.
Causes: Because the predict accuracy can be effected by some extreme samples when applying large learning rate. Also, when the learning rate is so small, the neural network can barely learn.
Lessons: We need an appropriate learning rate to ensure the validity of the experiment.
3.9 Problem 4
Adjust batch_size
, batch_size=1
, batch_size=210
, batch_size=1~210
. Illuminate the influence and causes of the batch_size
on the loss and accuracy value.
- batch_size = 1 image
- batch_size = 16 image
- batch_size = 32 image
- batch_size = 120 image
- batch_size = 210 image
Influence: It shows that when the batch_size
is relatively small, the accuracy is relatively higher at the expense of longer computing time.
Causes: Smaller batch_size
relates to more iterations, we can update the weights more times. But more iterations need more computing time.
Lessons: We need an appropriate batch_size
to ensure the validity of the experiment. Sometimes we need a big batch_size
to accelerate convergence. But too big batch_size
may induce memory overflow and accuracy decrease.
3.10 Problem 5
Use SGD optimizer, and try to adjust momentum
from 0 to 0.9, illuminate the influence of the momentum
on the loss and accuracy value.
- momentum = 0 image
- momentum = 0.9 image
indicates the degree to which the original direction of update should be preserved. When updating, the previous update direction is kept to a certain extent, and the final update direction is fine-tuned using the current batch gradient. In this way, the stability can be increased to a certain etent, so as to learn faster, and there is a certain ability to get rid of local optimality.
Influence: When it is relatively large, the update inertia is bigger. When momentum = 0
, the accuracy is very low, maybe because the last batch is an exception. The result is of high accuracy when momentum = .9
3.11 Problem 6
Try to use different optimizer, such as Adam and Rprop, to conduct the experiment. Illuminate the influence of the optimizers on the loss and accuracy value.
- Stochastic Gradient Descent image
- Adam image
- Rprop image
Influence: Three optimizers all converge to same accuracy, but Rprop is the fast among the three aparently, but it has fluctuate after converging. Adam and SGD, thougn converge slowly, their loss curves are smother and no fluctuate appear when converging. I view the reference below and try some other optimizers as well.
- ASGD image
- Adamax image
- RMSprop image
3.12 Problem 7
Try adjusting the above parameters at the same time, find out what you think is the most suitable parameter (the model converges the fastest), and talk about it.
After many tries, we finally choose Adam optimizer with .01 learning rate, and batches have size of 16. High learning rate can make weights converge fast, but this also increase the risk of disturbance of exceptions. Having too much batches add burden to memory; too many iterations increase computing time. So we choose a relatively small batch_size = 16
as a trade-off. Finally, we choose Adam optimizer, considering its converging speed and its stability.
(Further content, read it when you are free)
A lot of effort in solving any machine learning problem goes in to preparing the data. PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable. In this tutorial, we will see how to load and preprocess/augment data from a non trivial dataset.
4.1 Packages installation
scikit-image: For image io and transforms
sudo apt-get install python-numpy
sudo apt-get install python-scipy
sudo apt-get install python-matplotlib
sudo pip install scikit-image
pandas: For easier csv parsing
sudo apt-get install python-pandas
import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
plt.ion() # interactive mode
4.2 Annotations in array
# read a csv file by pandas
landmarks_frame = pd.read_csv('data/faces/face_landmarks.csv')
n = 0
# read image name, image name was saved in column 1.
img_name = landmarks_frame.iloc[n, 0]
# points were saved in columns from 2 to the end
landmarks = landmarks_frame.iloc[n, 1:].values
# reshape the formate of points
landmarks = landmarks.astype('float').reshape(-1, 2)
print('Image name: {}'.format(img_name))
print('Landmarks shape: {}'.format(landmarks.shape))
print('First 4 Landmarks: {}'.format(landmarks[:4]))
Image name: 0805personali01.jpg
Landmarks shape: (68, 2)
First 4 Landmarks: [[ 27. 83.]
[ 27. 98.]
[ 29. 113.]
[ 33. 127.]]
def show_landmarks(image, landmarks):
"""Show image with landmarks"""
plt.scatter(landmarks[:, 0], landmarks[:, 1], s=10, marker='.', c='r')
plt.pause(0.001) # pause a bit so that plots are updated
show_landmarks(io.imread(os.path.join('data/faces/', img_name)),
class FaceLandmarksDataset(Dataset):
def __init__(self, csv_file, root_dir, transform=None):
csv_file (string): Path to the csv file with annotations.
root_dir (string): Directory with all the images.
transform (callable, optional): Optional transform to be applied
on a sample.
self.landmarks_frame = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.landmarks_frame)
def __getitem__(self, idx):
# combine the relative path of images
img_name = os.path.join(self.root_dir,
self.landmarks_frame.iloc[idx, 0])
image = io.imread(img_name)
landmarks = self.landmarks_frame.iloc[idx, 1:].values
landmarks = landmarks.astype('float').reshape(-1, 2)
# save all data we may need during training a network in a dict
sample = {'image': image, 'landmarks': landmarks}
if self.transform:
sample = self.transform(sample)
return sample
Important note:
To define a dataset, first we must to inherit the class torch.utils.data.Dataset
. when we write ourselves dataset, it's neccesarry for us to overload the ___init____
method, ___len____
method, and ___getitem____
method. Of course you can define other method as you like.
4.3 Instantiation and Iteration
face_dataset = FaceLandmarksDataset(csv_file='data/faces/face_landmarks.csv',
fig = plt.figure()
for i in range(len(face_dataset)):
sample = face_dataset[i]
print(i, sample['image'].shape, sample['landmarks'].shape)
# create subgraph
ax = plt.subplot(1, 4, i + 1)
ax.set_title('Sample #{}'.format(i))
if i == 3:
0 (324, 215, 3) (68, 2)
1 (500, 333, 3) (68, 2)
2 (250, 258, 3) (68, 2)
3 (434, 290, 3) (68, 2)