美文网首页
PyTorch Deep Learning (II) - A S

PyTorch Deep Learning (II) - A S

作者: ElliotG | 来源:发表于2023-01-13 16:32 被阅读0次

1. Background

A simple and familiar problem: a linear regression with a single feature.

Simple linear regression model:


image.png
image.png

 

2. Import libraries and make preparation

  • Libraries we need in the demo
import numpy as np
from sklearn.linear_model import LinearRegression

import torch
import torch.optim as optim
import torch.nn as nn
from torchviz import make_dot
import matplotlib.pyplot as plt
  • Make preparations, some custom functions like plot
# Make preparations, some custom functions like plot
def figure1(x_train, y_train, x_val, y_val):
    fig, ax = plt.subplots(1, 2, figsize=(12, 6))
    
    ax[0].scatter(x_train, y_train)
    ax[0].set_xlabel('x')
    ax[0].set_ylabel('y')
    ax[0].set_ylim([0, 3.1])
    ax[0].set_title('Generated Data - Train')

    ax[1].scatter(x_val, y_val, c='r')
    ax[1].set_xlabel('x')
    ax[1].set_ylabel('y')
    ax[1].set_ylim([0, 3.1])
    ax[1].set_title('Generated Data - Validation')
    fig.tight_layout()
    
    return fig, ax

 

3. Data Generation

  • 2-1) Let’s start generating some synthetic data
    We start with a vector of 100 (N) points for our feature (x) and create our labels (y) using b = 1, w = 2,
    and some Gaussian noise(epsilon).
# Synthetic Data Generation
true_b = 1
true_w = 2
N = 100

# Data Generation
np.random.seed(42)
x = np.random.rand(N, 1)
epsilon = (.1 * np.random.randn(N, 1))
y = true_b + true_w * x + epsilon

  • 2-2) Split data into train and validation sets
    Next, let’s split our synthetic data into train and validation sets, shuffling the array of indices and using the first 80 shuffled points for training.
# Shuffles the indices
idx = np.arange(N)
np.random.shuffle(idx)

# Uses first 80 random indices for train
train_idx = idx[:int(N*.8)]
# Uses the remaining indices for validation
val_idx = idx[int(N*.8):]

# Generates train and validation sets
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

# using plot to draw train and validation data
figure1(x_train, y_train, x_val, y_val)

Result:


image.png

 

4. Gradient Descent

  • 4-1) Random Initialization

For training a model, you need to randomly initialize the parameters / weights(in this example, we have only two, b and w).

# Step 0 - Initializes parameters "b" and "w" randomly
np.random.seed(42)
b = np.random.randn(1)
w = np.random.randn(1)

print(b, w)

Output:
[0.49671415] [-0.1382643]


  • 4-2) Compute Model’s Predictions

This is the forward pass; it simply computes the model’s predictions using the current values of the parameters / weights. At the very beginning, we will be producing really bad predictions, as we started with random values.

# Step 1 - Computes our model's predicted output - forward pass
yhat = b + w * x_train

  • 4-3) Compute the Loss

For a regression problem, the loss is given by the mean squared error (MSE); that is, the average of all squared errors; that is, the average of all squared differences between labels (y) and predictions (b + wx).
In the code below, we are using all data points of the training set to compute the loss, so n = N = 80, meaning we are performing batch gradient descent.

# Step 2 - Computing the loss
# We are using ALL data points, so this is BATCH gradient
# descent. How wrong is our model? That's the error!
error = (yhat - y_train)

# It is a regression, so it computes mean squared error (MSE)
loss = (error ** 2).mean()

print(loss)

Output:
2.720278897826747


  • 4-4) Compute the Gradients

A gradient is a partial derivative.
A derivative tells you how much a given quantity changes when you slightly vary some other quantity.

Gradient = how much the loss changes if ONE parameter changes a little bit

# Step 3 - Computes gradients for both "b" and "w" parameters
b_grad = 2 * error.mean()
w_grad = 2 * (x_train * error).mean()
print(b_grad, w_grad)

Output:
-3.044811379650508 -1.8337537171510832


  • 4-5) Update the Parameters

In the final step, we use the gradients to update the parameters.
Since we are trying to minimize our losses, we reverse the sign of the gradient for the update.

# Sets learning rate - this is "eta" ~ the "n" like Greek letter
lr = 0.1
print(b, w)

# Step 4 - Updates parameters using gradients and 
# the learning rate
b = b - lr * b_grad
w = w - lr * w_grad

print(b, w)

Output:
[0.49671415] [-0.1382643]
[0.80119529] [0.04511107]

eg: Let’s start with a value of 0.1 for the learning rate (which is a
relatively high value, as far as learning rates are concerned!)


  • 4-6) Rinse and Repeat

We use the updated parameters to go back to Step 1 and restart the process.

相关文章

网友评论

      本文标题:PyTorch Deep Learning (II) - A S

      本文链接:https://www.haomeiwen.com/subject/tpqvcdtx.html