PyTorch Deep Learning (II) - A S

作者: ElliotG | 来源:发表于2023-01-13 16:32 被阅读0次

PyTorch Deep Learning (II) - A S
PyTorch 101 Part 1: Understandin
【Deep Learning with PyTorch 中文手册
【干货】141页的《Deep Learning with PyT
深度学习框架PyTorch入门（1）
深度学习框架PyTorch入门（2）
Deep Learning with PyTorch
2019-04-16 神经网络
Matrix Math and Numpy Refresher
TensorFlow Deep Learning (II) -

1. Background

A simple and familiar problem: a linear regression with a single feature.

Simple linear regression model:

image.png

2. Import libraries and make preparation

Libraries we need in the demo

import numpy as np
from sklearn.linear_model import LinearRegression

import torch
import torch.optim as optim
import torch.nn as nn
from torchviz import make_dot
import matplotlib.pyplot as plt

Make preparations, some custom functions like plot

# Make preparations, some custom functions like plot
def figure1(x_train, y_train, x_val, y_val):
    fig, ax = plt.subplots(1, 2, figsize=(12, 6))
    
    ax[0].scatter(x_train, y_train)
    ax[0].set_xlabel('x')
    ax[0].set_ylabel('y')
    ax[0].set_ylim([0, 3.1])
    ax[0].set_title('Generated Data - Train')

    ax[1].scatter(x_val, y_val, c='r')
    ax[1].set_xlabel('x')
    ax[1].set_ylabel('y')
    ax[1].set_ylim([0, 3.1])
    ax[1].set_title('Generated Data - Validation')
    fig.tight_layout()
    
    return fig, ax

3. Data Generation

2-1) Let’s start generating some synthetic data
We start with a vector of 100 (N) points for our feature (x) and create our labels (y) using b = 1, w = 2,
and some Gaussian noise(epsilon).

# Synthetic Data Generation
true_b = 1
true_w = 2
N = 100

# Data Generation
np.random.seed(42)
x = np.random.rand(N, 1)
epsilon = (.1 * np.random.randn(N, 1))
y = true_b + true_w * x + epsilon

2-2) Split data into train and validation sets
Next, let’s split our synthetic data into train and validation sets, shuffling the array of indices and using the first 80 shuffled points for training.

# Shuffles the indices
idx = np.arange(N)
np.random.shuffle(idx)

# Uses first 80 random indices for train
train_idx = idx[:int(N*.8)]
# Uses the remaining indices for validation
val_idx = idx[int(N*.8):]

# Generates train and validation sets
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

# using plot to draw train and validation data
figure1(x_train, y_train, x_val, y_val)

Result:

image.png

4. Gradient Descent

4-1) Random Initialization

For training a model, you need to randomly initialize the parameters / weights(in this example, we have only two, b and w).

# Step 0 - Initializes parameters "b" and "w" randomly
np.random.seed(42)
b = np.random.randn(1)
w = np.random.randn(1)

print(b, w)

Output:
[0.49671415] [-0.1382643]

4-2) Compute Model’s Predictions

This is the forward pass; it simply computes the model’s predictions using the current values of the parameters / weights. At the very beginning, we will be producing really bad predictions, as we started with random values.

# Step 1 - Computes our model's predicted output - forward pass
yhat = b + w * x_train

4-3) Compute the Loss

For a regression problem, the loss is given by the mean squared error (MSE); that is, the average of all squared errors; that is, the average of all squared differences between labels (y) and predictions (b + wx).
In the code below, we are using all data points of the training set to compute the loss, so n = N = 80, meaning we are performing batch gradient descent.

# Step 2 - Computing the loss
# We are using ALL data points, so this is BATCH gradient
# descent. How wrong is our model? That's the error!
error = (yhat - y_train)

# It is a regression, so it computes mean squared error (MSE)
loss = (error ** 2).mean()

print(loss)

Output:
2.720278897826747

4-4) Compute the Gradients

A gradient is a partial derivative.
A derivative tells you how much a given quantity changes when you slightly vary some other quantity.

Gradient = how much the loss changes if ONE parameter changes a little bit

# Step 3 - Computes gradients for both "b" and "w" parameters
b_grad = 2 * error.mean()
w_grad = 2 * (x_train * error).mean()
print(b_grad, w_grad)

Output:
-3.044811379650508 -1.8337537171510832

4-5) Update the Parameters

In the final step, we use the gradients to update the parameters.
Since we are trying to minimize our losses, we reverse the sign of the gradient for the update.

# Sets learning rate - this is "eta" ~ the "n" like Greek letter
lr = 0.1
print(b, w)

# Step 4 - Updates parameters using gradients and 
# the learning rate
b = b - lr * b_grad
w = w - lr * w_grad

print(b, w)

Output:
[0.49671415] [-0.1382643]
[0.80119529] [0.04511107]

eg: Let’s start with a value of 0.1 for the learning rate (which is a
relatively high value, as far as learning rates are concerned!)