学习率调整策略

发表于 05-08-2020 更新于 02-02-2023 分类于 Pytorch

1. 为什么要调整学习率？

可以先快后慢，根据某种学习率调整策略，在适当的时候更改学习率，使训练过程收敛更快，训练效果更好。

学习率调整策略的函数都继承于_LRScheduler这个基类。

主要属性：

optimizer：关联的优化器
last_epoch：记录epoch数
base_lrs：记录初始学习率

主要方法：

step()：更新下一个epoch的学习率（一次epoch运行一次step函数以更新学习率；在一次epoch以内，学习率不应该改变）
get_lr()：虚函数（需要子类重写），计算下一个epoch的学习率

>>> scheduler = ...
>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     scheduler.step()

2. pytorch的六种学习率调整策略

2.1 StepLR

torch.optim.lr_scheduler.StepLR(optimizer, 
                                step_size, 
                                gamma=0.1, 
                                last_epoch=-1, 
                                verbose=False)

功能：等间隔调整学习率；在每个step_size步长的时间段内，用gamma衰减每个参数组的学习速率。请注意，这种衰减可能与来自此lr_scheduler外部的学习速率的其他更改同时发生。当last_epoch=-1时，将初始lr设置为lr。

主要参数：

step_size：调整间隔数
gamma：调整系数
last_epoch：等于-1时设置lr为初始lr，若last_epoch不等于-1，则lr等于相应epoch时的lr。这个参数用于间断训练时可以直接从中断位置开始训练。

调整方式：lr = lr * gamma

实验：

import torch
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
torch.manual_seed(1)

LR = 0.1
iteration = 10
max_epoch = 200

# ---------------------- fake data and optimizer  -------------------

weights = torch.randn((1), requires_grad=True)
target = torch.zeros((1))

optimizer = optim.SGD([weights], lr=LR, momentum=0.9)

------------------------------  Step LR ------------------------------
scheduler_lr = optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)  # 设置学习率下降策略

lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):

    # 获取当前lr，新版本用 get_last_lr()函数，旧版本用get_lr()函数，具体看UserWarning
    lr_list.append(scheduler_lr.get_last_lr())
    epoch_list.append(epoch)

    for i in range(iteration):

        loss = torch.pow((weights - target), 2)
        loss.backward()

        optimizer.step()
        optimizer.zero_grad()

    scheduler_lr.step()

plt.plot(epoch_list, lr_list, label="Step LR Scheduler")
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()

每隔step_size=50个epoch，学习率会将为原来的gamma=0.1倍。

2.2 MultiStepLR

torch.optim.lr_scheduler.MultiStepLR(optimizer, 
                                     milestones, 
                                     gamma=0.1, 
                                     last_epoch=-1, 
                                     verbose=False)

功能：按给定间隔调整学习率；当epoch达到其中一个milestones时，用gamma衰减每个参数组的学习速率。请注意，这种衰减可能与来自此lr_scheduler外部的学习速率的其他更改同时发生。当last_epoch=-1时，将初始lr设置为lr。

主要参数：

milestones：设定调整时刻数
gamma：调整系数

调整方式：lr = lr * gamma

实验：

import torch
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
torch.manual_seed(1)

LR = 0.1
iteration = 10
max_epoch = 200

milestones = [50, 125, 160]
scheduler_lr = optim.lr_scheduler.MultiStepLR(optimizer, milestones=milestones, gamma=0.1)

lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):

    lr_list.append(scheduler_lr.get_last_lr())
    epoch_list.append(epoch)

    for i in range(iteration):

        loss = torch.pow((weights - target), 2)
        loss.backward()

        optimizer.step()
        optimizer.zero_grad()

    scheduler_lr.step()

plt.plot(epoch_list, lr_list, label="Multi Step LR Scheduler\nmilestones:{}".format(milestones))
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()

每当epoch达到milestones时，学习率会将为原来的gamma=0.1倍。

2.3 ExponentialLR

torch.optim.lr_scheduler.ExponentialLR(optimizer, 
                                       gamma, 
                                       last_epoch=-1, 
                                       verbose=False)

功能：按指数衰减调整学习率

主要参数：

gamma：指数的底

调整方式： $lr = lr * gamma ^ {epoch}$

实验：

import torch
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
torch.manual_seed(1)

LR = 0.1
iteration = 10
max_epoch = 200

gamma = 0.95
scheduler_lr = optim.lr_scheduler.ExponentialLR(optimizer, gamma=gamma)

lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):

    lr_list.append(scheduler_lr.get_last_lr())
    epoch_list.append(epoch)

    for i in range(iteration):

        loss = torch.pow((weights - target), 2)
        loss.backward()

        optimizer.step()
        optimizer.zero_grad()

    scheduler_lr.step()

plt.plot(epoch_list, lr_list, label="Exponential LR Scheduler\ngamma:{}".format(gamma))
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()

每次epoch都会调整学习率

2.4 CosineAnnealingLR

torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 
                                           T_max, eta_min=0, 
                                           last_epoch=-1, 
                                           verbose=False)

功能：余弦周期调整学习率

主要参数：

T_max：下降周期
eta_min：学习率下限

调整方式： $\eta_{t}=\eta_{\min }+\frac{1}{2}\left(\eta_{\max }-\eta_{\min }\right)\left(1+\cos \left(\frac{T_{c u r}}{T_{\max }} \pi\right)\right)$

实验：

import torch
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
torch.manual_seed(1)

LR = 0.1
iteration = 10
max_epoch = 200

t_max = 50
scheduler_lr = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=t_max, eta_min=0.)

lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):

    lr_list.append(scheduler_lr.get_last_lr())
    epoch_list.append(epoch)

    for i in range(iteration):

        loss = torch.pow((weights - target), 2)
        loss.backward()

        optimizer.step()
        optimizer.zero_grad()

    scheduler_lr.step()

plt.plot(epoch_list, lr_list, label="CosineAnnealingLR Scheduler\nT_max:{}".format(t_max))
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()

2.5 ReduceLRonPlateau

torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 
                                           mode='min', 
                                           factor=0.1, 
                                           patience=10, 
                                           threshold=0.0001, 
                                           threshold_mode='rel', 
                                           cooldown=0, 
                                           min_lr=0, 
                                           eps=1e-08, 
                                           verbose=False)

功能：监控指标，当指标（比如loss或者accuracy）不再变化则调整

主要参数：

mode：min/max 两种模式；在min模式下，当监测到的量停止减少时，lr将减小；在max模式下，当监测的量停止增加时，lr将减小。默认值：“min”。
factor：调整系数
patience：“耐心”，接受几轮epoch指标不变化；
cooldown：“冷却时间”，调整之后停止监控几轮epoch
verbose：是否打印日志
min_lr：学习率下限
eps：学习率衰减最小值，如果更新后的lr与原来的lr的差不大于eps，则此次更新会被忽视。

实验：

import torch
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
torch.manual_seed(1)

LR = 0.1
iteration = 10
max_epoch = 200

loss_value = 0.5
accuray = 0.9

factor = 0.1
mode = "min"
patience = 10
cooldown = 10
min_lr = 1e-4
verbose = True

scheduler_lr = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 
                                                    factor=factor, 
                                                    mode=mode, 
                                                    patience=patience,
                                                    cooldown=cooldown, 
                                                    min_lr=min_lr,
                                                    verbose=verbose)

for epoch in range(max_epoch):
    for i in range(iteration):

        # train(...)

        optimizer.step()
        optimizer.zero_grad()

    if epoch == 5:
        loss_value = 0.4

    scheduler_lr.step(loss_value) # 监控指标为loss_value

输出为：

1
2
3

Epoch    17: reducing learning rate of group 0 to 1.0000e-02.
Epoch    38: reducing learning rate of group 0 to 1.0000e-03.
Epoch    59: reducing learning rate of group 0 to 1.0000e-04.

从EPOCH== 6开始（scheduler_lr打印的日志EPOCH是从1开始的，代码中我们的epoch是从0开始的，所以在epoch=5时改变学习率，就算在EPOCH=6时改变学习率），loss_value = 0.4，并且由于patience = 10，于是在第7，8，9，10，11，12，13，14，15，16 个EPOCH之后，在第17个EPOCH改变了lr = 0.1*0.1；然后EPOCH从17开始也没改变了，但是cooldown=10，实际上cooldown+patience=20，所以下一次更新lr要等20个EPOCH，也就是在18，19，…，37个EPOCH之后的第38个EPOCH再次改变lr。

2.6 LambdaLR

1
2
3

lr_scheduler.LambdaLR(optimizer, 
                      lr_lambda, 
                      last_epoch=-1)

功能：自定义调整策略；将每个参数组的学习率设置为初始lr乘以给定函数。当last_epoch=-1时，将lr设置为初始lr。

主要参数：

lr_lambda：function or list ，学习率调整函数

实验：

import torch
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
torch.manual_seed(1)

LR = 0.1
iteration = 10
max_epoch = 200

lr_init = 0.1

weights_1 = torch.randn((6, 3, 5, 5))
weights_2 = torch.ones((5, 5))

optimizer = optim.SGD([
    {'params': [weights_1]},
    {'params': [weights_2]}], lr=lr_init)

lambda1 = lambda epoch: 0.1 ** (epoch // 20)
lambda2 = lambda epoch: 0.95 ** epoch

scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])

lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
    lr_list.append(scheduler.get_lr())
    epoch_list.append(epoch)
    
    for i in range(iteration):

        # train(...)

        optimizer.step()
        optimizer.zero_grad()

    scheduler.step()

    print('epoch:{:5d}, lr:{}'.format(epoch, scheduler.get_lr()))

plt.plot(epoch_list, [i[0] for i in lr_list], label="lambda 1")
plt.plot(epoch_list, [i[1] for i in lr_list], label="lambda 2")
plt.xlabel("Epoch")
plt.ylabel("Learning Rate")
plt.title("LambdaLR")
plt.legend()
plt.show()

可以看到两组参数的学习率遵循了不同的更新方法。

3. 学习率调整小结

学习率调整小结

有序调整：Step、MultiStep、Exponential 和 CosineAnnealing
自适应调整：ReduceLROnPleateau
自定义调整：Lambda

学习率初始化：

设置较小数：0.01、0.001、0.0001
搜索最大学习率：《Cyclical Learning Rates for Training Neural
Networks》