1. 为什么要调整学习率?
可以先快后慢,根据某种学习率调整策略,在适当的时候更改学习率,使训练过程收敛更快,训练效果更好。
学习率调整策略的函数都继承于_LRScheduler这个基类。
主要属性:
optimizer:关联的优化器
last_epoch:记录epoch数
base_lrs:记录初始学习率
主要方法:
step():更新下一个epoch的学习率(一次epoch运行一次step函数以更新学习率;在一次epoch以内,学习率不应该改变)
get_lr():虚函数(需要子类重写),计算下一个epoch的学习率
1 2 3 4 5 >>> scheduler = ...>>> for epoch in range (100 ):>>> train(...)>>> validate(...)>>> scheduler.step()
2. pytorch的六种学习率调整策略
2.1 StepLR
1 2 3 4 5 torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1 , last_epoch=-1 , verbose=False )
功能:等间隔调整学习率;在每个step_size步长的时间段内,用gamma衰减每个参数组的学习速率。请注意,这种衰减可能与来自此lr_scheduler外部的学习速率的其他更改同时发生。当last_epoch=-1时,将初始lr设置为lr。
主要参数:
step_size:调整间隔数
gamma:调整系数
last_epoch:等于-1时设置lr为初始lr,若last_epoch不等于-1,则lr等于相应epoch时的lr。这个参数用于间断训练时可以直接从中断位置开始训练。
调整方式:lr = lr * gamma
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 import torchimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as plttorch.manual_seed(1 ) LR = 0.1 iteration = 10 max_epoch = 200 weights = torch.randn((1 ), requires_grad=True ) target = torch.zeros((1 )) optimizer = optim.SGD([weights], lr=LR, momentum=0.9 ) ------------------------------ Step LR ------------------------------ scheduler_lr = optim.lr_scheduler.StepLR(optimizer, step_size=50 , gamma=0.1 ) lr_list, epoch_list = list (), list () for epoch in range (max_epoch): lr_list.append(scheduler_lr.get_last_lr()) epoch_list.append(epoch) for i in range (iteration): loss = torch.pow ((weights - target), 2 ) loss.backward() optimizer.step() optimizer.zero_grad() scheduler_lr.step() plt.plot(epoch_list, lr_list, label="Step LR Scheduler" ) plt.xlabel("Epoch" ) plt.ylabel("Learning rate" ) plt.legend() plt.show()
每隔step_size=50个epoch,学习率会将为原来的gamma=0.1倍。
2.2 MultiStepLR
1 2 3 4 5 torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1 , last_epoch=-1 , verbose=False )
功能:按给定间隔调整学习率;当epoch达到其中一个milestones时,用gamma衰减每个参数组的学习速率。请注意,这种衰减可能与来自此lr_scheduler外部的学习速率的其他更改同时发生。当last_epoch=-1时,将初始lr设置为lr。
主要参数:
milestones:设定调整时刻数
gamma:调整系数
调整方式:lr = lr * gamma
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 import torchimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as plttorch.manual_seed(1 ) LR = 0.1 iteration = 10 max_epoch = 200 milestones = [50 , 125 , 160 ] scheduler_lr = optim.lr_scheduler.MultiStepLR(optimizer, milestones=milestones, gamma=0.1 ) lr_list, epoch_list = list (), list () for epoch in range (max_epoch): lr_list.append(scheduler_lr.get_last_lr()) epoch_list.append(epoch) for i in range (iteration): loss = torch.pow ((weights - target), 2 ) loss.backward() optimizer.step() optimizer.zero_grad() scheduler_lr.step() plt.plot(epoch_list, lr_list, label="Multi Step LR Scheduler\nmilestones:{}" .format (milestones)) plt.xlabel("Epoch" ) plt.ylabel("Learning rate" ) plt.legend() plt.show()
每当epoch达到milestones时,学习率会将为原来的gamma=0.1倍。
2.3 ExponentialLR
1 2 3 4 torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1 , verbose=False )
功能:按指数衰减调整学习率
主要参数:
调整方式:l r = l r ∗ g a m m a e p o c h lr = lr * gamma ^ {epoch} l r = l r ∗ g a m m a e p o c h
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 import torchimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as plttorch.manual_seed(1 ) LR = 0.1 iteration = 10 max_epoch = 200 gamma = 0.95 scheduler_lr = optim.lr_scheduler.ExponentialLR(optimizer, gamma=gamma) lr_list, epoch_list = list (), list () for epoch in range (max_epoch): lr_list.append(scheduler_lr.get_last_lr()) epoch_list.append(epoch) for i in range (iteration): loss = torch.pow ((weights - target), 2 ) loss.backward() optimizer.step() optimizer.zero_grad() scheduler_lr.step() plt.plot(epoch_list, lr_list, label="Exponential LR Scheduler\ngamma:{}" .format (gamma)) plt.xlabel("Epoch" ) plt.ylabel("Learning rate" ) plt.legend() plt.show()
每次epoch都会调整学习率
2.4 CosineAnnealingLR
1 2 3 4 torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0 , last_epoch=-1 , verbose=False )
功能:余弦周期调整学习率
主要参数:
调整方式:η t = η min + 1 2 ( η max − η min ) ( 1 + cos ( T c u r T max π ) ) \eta_{t}=\eta_{\min }+\frac{1}{2}\left(\eta_{\max }-\eta_{\min }\right)\left(1+\cos \left(\frac{T_{c u r}}{T_{\max }} \pi\right)\right) η t = η min + 2 1 ( η max − η min ) ( 1 + cos ( T max T c u r π ) )
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 import torchimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as plttorch.manual_seed(1 ) LR = 0.1 iteration = 10 max_epoch = 200 t_max = 50 scheduler_lr = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=t_max, eta_min=0. ) lr_list, epoch_list = list (), list () for epoch in range (max_epoch): lr_list.append(scheduler_lr.get_last_lr()) epoch_list.append(epoch) for i in range (iteration): loss = torch.pow ((weights - target), 2 ) loss.backward() optimizer.step() optimizer.zero_grad() scheduler_lr.step() plt.plot(epoch_list, lr_list, label="CosineAnnealingLR Scheduler\nT_max:{}" .format (t_max)) plt.xlabel("Epoch" ) plt.ylabel("Learning rate" ) plt.legend() plt.show()
2.5 ReduceLRonPlateau
1 2 3 4 5 6 7 8 9 10 torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min' , factor=0.1 , patience=10 , threshold=0.0001 , threshold_mode='rel' , cooldown=0 , min_lr=0 , eps=1e-08 , verbose=False )
功能:监控指标,当指标(比如loss或者accuracy)不再变化则调整
主要参数:
mode:min/max 两种模式;在min模式下,当监测到的量停止减少时,lr将减小;在max模式下,当监测的量停止增加时,lr将减小。默认值:“min”。
factor:调整系数
patience:“耐心”,接受几轮epoch指标不变化;
cooldown:“冷却时间”,调整之后停止监控几轮epoch
verbose:是否打印日志
min_lr:学习率下限
eps:学习率衰减最小值,如果更新后的lr与原来的lr的差不大于eps,则此次更新会被忽视。
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 import torchimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as plttorch.manual_seed(1 ) LR = 0.1 iteration = 10 max_epoch = 200 loss_value = 0.5 accuray = 0.9 factor = 0.1 mode = "min" patience = 10 cooldown = 10 min_lr = 1e-4 verbose = True scheduler_lr = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=factor, mode=mode, patience=patience, cooldown=cooldown, min_lr=min_lr, verbose=verbose) for epoch in range (max_epoch): for i in range (iteration): optimizer.step() optimizer.zero_grad() if epoch == 5 : loss_value = 0.4 scheduler_lr.step(loss_value)
输出为:
1 2 3 Epoch 17: reducing learning rate of group 0 to 1.0000e-02. Epoch 38: reducing learning rate of group 0 to 1.0000e-03. Epoch 59: reducing learning rate of group 0 to 1.0000e-04.
从EPOCH== 6开始(scheduler_lr打印的日志EPOCH是从1开始的,代码中我们的epoch是从0开始的,所以在epoch=5时改变学习率,就算在EPOCH=6时改变学习率),loss_value = 0.4,并且由于patience = 10,于是在第7,8,9,10,11,12,13,14,15,16 个EPOCH之后,在第17个EPOCH改变了lr = 0.1*0.1;然后EPOCH从17开始也没改变了,但是cooldown=10,实际上cooldown+patience=20,所以下一次更新lr要等20个EPOCH,也就是在18,19,…,37个EPOCH之后的第38个EPOCH再次改变lr。
2.6 LambdaLR
1 2 3 lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1 )
功能:自定义调整策略;将每个参数组的学习率设置为初始lr乘以给定函数。当last_epoch=-1时,将lr设置为初始lr。
主要参数:
lr_lambda:function or list ,学习率调整函数
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 import torchimport torch.optim as optimimport numpy as npimport matplotlib.pyplot as plttorch.manual_seed(1 ) LR = 0.1 iteration = 10 max_epoch = 200 lr_init = 0.1 weights_1 = torch.randn((6 , 3 , 5 , 5 )) weights_2 = torch.ones((5 , 5 )) optimizer = optim.SGD([ {'params' : [weights_1]}, {'params' : [weights_2]}], lr=lr_init) lambda1 = lambda epoch: 0.1 ** (epoch // 20 ) lambda2 = lambda epoch: 0.95 ** epoch scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) lr_list, epoch_list = list (), list () for epoch in range (max_epoch): lr_list.append(scheduler.get_lr()) epoch_list.append(epoch) for i in range (iteration): optimizer.step() optimizer.zero_grad() scheduler.step() print ('epoch:{:5d}, lr:{}' .format (epoch, scheduler.get_lr())) plt.plot(epoch_list, [i[0 ] for i in lr_list], label="lambda 1" ) plt.plot(epoch_list, [i[1 ] for i in lr_list], label="lambda 2" ) plt.xlabel("Epoch" ) plt.ylabel("Learning Rate" ) plt.title("LambdaLR" ) plt.legend() plt.show()
可以看到两组参数的学习率遵循了不同的更新方法。
3. 学习率调整小结
学习率调整小结
有序调整:Step、MultiStep、Exponential 和 CosineAnnealing
自适应调整:ReduceLROnPleateau
自定义调整:Lambda
学习率初始化:
设置较小数:0.01、0.001、0.0001
搜索最大学习率: 《Cyclical Learning Rates for Training Neural
Networks》