美文网首页
Pytorch中的学习率调整方法

Pytorch中的学习率调整方法

作者: zqyadam | 来源:发表于2020-11-21 23:26 被阅读0次

介绍

Pytorch中有6种学习率调整方法,分别如下:

  • StepLR
  • MultiStepLR
  • ExponentialLR
  • CosineAnnealingLR
  • ReduceLRonPlateau
  • LambdaLR

它们用来在不停的迭代中去修改学习率,这6种方法都继承于一个基类_LRScheduler,这个类有三个主要属性以及两个主要方法

三个主要属性分别是:

  • optimizer:关联的优化器
  • last_epoch:记录epoch数
  • base_lrs:记录初始学习率

两个主要方法分别是:

  • step():更新下一个epoch的学习率
  • get_last_lr():返回上次计算后的学习率

先分别介绍一下Pytorch中提供的6种学习率调整方法

注意

学习率的调整只能在epoch循环中使用,不能在batch循环中使用,因为那样将导致学习率快速下降。而且从属性中的last_epoch也可以看出,学习率调整是在epoch层面进行的。

StepLR

功能:等间隔调整学习率,也就是每隔一段时间去调整学习率,最终学习率的图形是阶梯形状逐渐下降(gamma小于1)或者上升(gamma大于1,估计没有人会这么设置吧)

主要参数:

  • step_size:调整学习率的间隔数
  • gamma:调整系数,也就是每次调整学习率,都将之前的学习率乘以这个系数,具体调整方式:lr=lr*gamma

下面我们来看一看StepLR的图形变化,这里取gamma为0.5,共50个epoch:

StepLR

详细代码见附录

MultiStepLR

功能:按给定间隔调整学习率

主要参数:

  • milestones:设定调整时刻数,这个参数是一个列表,其中每一项都是一个整数,代表所需要调整学习率的epoch时刻,例如:[50,125,180]表示分别在epoch为50,125,180时进行调整
  • gamma:调整系数与StepLR中的gamma是同样的含义

下图是MultiStepLR的变化曲线,这里设置的milestones[20, 25, 35],可以看出在第20,25,35个epoch时,学习率有所变化

MultiStepLR

ExponentialLR

功能:按指数衰减调整学习率

主要参数:

  • gamma:指数的底,通常设置为接近1的数(0.9),调整方式:lr=lr*gamma**epoch

下图是MultiStepLR的变化曲线,gamma取值为0.9,可以看出,这里的学习率是呈指数形式下降的

ExponentialLR

CosineAnnealingLR

功能:余弦周期调整学习率,这种调整方式是可以增大学习率的

主要参数:

  • T_max:下降周期,这个参数表示的是余弦周期的一半
  • eta_min:学习率下限

调整方式:


image-20201121151425528.png

下图是CosineAnnealingLR的变化曲线,T_max设置为10,eta_min没有设置,默认为0,从图中可以看出学习率的变化周期性的变化,Cos函数的周期是T_max的2倍,也就是20

CosineAnnealingLR

ReduceLRonPlateau

功能:监控指标,当指标不再变化则调整,非常实用,可以监控loss或者accuracy

主要参数:

  • mode:min/max两种模式,min观察监控的指标不下降就调整,max观察监控的指标不上升就调整
  • factor:调整系数,相当于StepLR中的gamma
  • patience:“耐心”,接受连续几次不变化
  • cooldown:“冷却时间”,停止监控一段时间
  • verbose:是否打印日志
  • min_lr:学习率下限
  • eps:学习率衰减最小值

下图是ReduceLRonPlateau的变化曲线,一些参数设置如下:

lr = 0.1

factor = 0.3
mode = "min"
patience = 5
cooldown = 3
min_lr = 1e-4
verbose = True

这里最初使用一个固定的loss_value=0.5来模拟loss的不变化,然后再第4个epoch时,将loss_value设置为0.4,图像如下:

ReduceLROnPlateau

终端中输出如下信息:

Epoch    10: reducing learning rate of group 0 to 3.0000e-02.
Epoch    19: reducing learning rate of group 0 to 9.0000e-03.
Epoch    28: reducing learning rate of group 0 to 2.7000e-03.
Epoch    37: reducing learning rate of group 0 to 8.1000e-04.
Epoch    46: reducing learning rate of group 0 to 2.4300e-04.

分析

终端中显示第10个epoch时,对学习率进行了调整,其中0,1,2,3的epoch(4次epoch),没有对学习率进行调整,在epoch=3(第4次)时,由于对loss_value手动减小到了0.4,模拟了loss减小,所以ReduceLRonPlateaupatienceepoch=4(第5次)时重新开始计数,直到epoch=8(第9次)时,patience到达了极限(patience=5),所以在epoch=9(第10次)时对学习率进行了调整,学习率被乘以0.3,调整到了0.03。

此后,ReduceLRonPlateau进入cooldown状态,等待3轮(cooldown=3)不对loss进行监控,直到epoch=12(第13次),然后继续观察loss的变化,观察5个epoch,此时epoch=17(第18次),patience又到达了极限(patience=5),在epoch=18(第19次)时对学习率进行了调整,学习率又被乘以0.3,调整到了0.009。

后续依次类推,学习率分别在第28、37、46次时被进行了调整。

LambdaLR

功能:自定义调整策略

主要参数:

  • lr_lambda:function or list,如果是list,则list中每一元素都得是function。这里传入lr_lambda的参数是last_epoch

下面使用LambdaLR模拟一下ExponentialLRgamma设置为0.95

lambda epoch: 0.95**epoch

生成的曲线如下图所示:

LambdaLR

附录

下面代码中的Net为假的网络,无实际意义

StepLR代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1


# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = StepLR(optimizer=optimizer, step_size=5, gamma=0.5)

lr_list = []

for i in range(50):
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    scheduler.step()
    lr_list.append(optimizer.param_groups[0]['lr'])


# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['StepLR:gamma=0.5'])
plt.show()
print(scheduler)

MultiStepLRd代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import MultiStepLR
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1

# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = MultiStepLR(optimizer=optimizer, milestones=[20, 25, 35], gamma=0.5)

lr_list = []

for i in range(50):
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    lr_list.append(optimizer.param_groups[0]['lr'])
    scheduler.step()

# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['MultiStepLR:gamma=0.5'])
plt.show()
print(scheduler)

ExponentialLR代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import ExponentialLR
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1
gamma = 0.9

# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = ExponentialLR(optimizer=optimizer, gamma=gamma)

lr_list = []

for i in range(50):
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    lr_list.append(optimizer.param_groups[0]['lr'])
    scheduler.step()

# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['ExponentialLR: gamma={}'.format(gamma)])
plt.show()
print(scheduler)

ReduceLRonPlateau代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1

loss_value = 0.5

factor = 0.3
mode = "min"
patience = 5
cooldown = 3
min_lr = 1e-4
verbose = True

# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)
print(train_x.shape, train_y.shape)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = ReduceLROnPlateau(optimizer=optimizer, mode=mode, factor=factor, patience=patience,
                              verbose=verbose, cooldown=cooldown, min_lr=min_lr)

lr_list = []

for i in range(50):
    lr_list.append(optimizer.param_groups[0]['lr'])
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # 手动模拟学习率的降低
    if i == 3:
        loss_value = 0.4

    scheduler.step(loss_value)

# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['ReduceLROnPlateau'])
plt.show()
print(scheduler)

LambdaLR代码

import matplotlib.pyplot as plt
import torch
from torch.nn import Linear, Sequential
from torch.optim.lr_scheduler import LambdaLR
from torch.utils.data import DataLoader, TensorDataset

lr = 0.1


# 生成一堆假数据
x_data = torch.linspace(0, 50, 100)
x_data = torch.unsqueeze(x_data, 0)
y_data = x_data ** 2 + torch.randn(100) * 20
x_data = x_data.permute(1, 0)
y_data = y_data.permute(1, 0)
print(x_data.shape)
print(y_data.shape)
# plt.plot(x_data, y_data)
# plt.show()


train_dataset = TensorDataset(x_data, y_data)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

train_data = iter(train_loader)
train_x, train_y = next(train_data)


class Net(torch.nn.Module):
    def __init__(self, hidden_num=10):
        super(Net, self).__init__()
        self.layer = Sequential(
            Linear(1, hidden_num),
            Linear(hidden_num, 1),
        )

    def forward(self, x):
        x = self.layer(x)
        return x


net = Net()

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
scheduler = LambdaLR(optimizer=optimizer, lr_lambda=lambda epoch: 0.95**epoch) # 模拟ExponentialLR


lr_list = []

for i in range(50):
    lr_list.append(scheduler.get_last_lr())
    for x, y in train_loader:
        y_pred = net(x)
        loss = criterion(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    scheduler.step()

# 绘制lr变化曲线
plt.plot(lr_list)
plt.legend(labels=['LambdaLR'])
plt.show()
print(scheduler)

相关文章

网友评论

      本文标题:Pytorch中的学习率调整方法

      本文链接:https://www.haomeiwen.com/subject/blyyiktx.html