1. 损失函数概念
损失函数:衡量模型输出与真实标签的差异
损失函数(Loss Function):
L o s s = f ( y ^ , y ) Loss=f\left( \hat{y},y \right) L o s s = f ( y ^ , y )
代价函数(Cost Function):
C o s t = 1 N ∑ i = 1 N f ( y ^ i , y ) Cost=\frac{1}{N}\sum_{i=1}^N{f\left( \hat{y}_i,y \right)} C o s t = N 1 ∑ i = 1 N f ( y ^ i , y )
目标函数(Objective Function):
O b j = C o s t + R e g u l a r i z a t i o n Obj\,\,=\,\,Cost\,\,+\,\,Regularization O b j = C o s t + R e g u l a r i z a t i o n
2. pytorch提供的各种损失函数
2.1 nn.CrossEntropyLoss
1 2 3 4 5 nn.CrossEntropyLoss(weight=None , size_average=None , ignore_index=-100 , reduce=None , reduction='mean' )
功能: nn.LogSoftmax ()与nn.NLLLoss ()结合,进行交叉熵计算
主要参数:
weight:reduction='mean’时,各类别的loss设置权值(若求平均值,则weight=[1,…,1])
ignore_index:忽略某个类别
reduction :计算模式,可为none/sum/mean
none- 逐个元素计算
sum- 所有元素求和,返回标量
mean- 加权平均,返回标量
size_average和reduce不用填,即将被抛弃的参数(下个版本就取消)
loss的计算公式如下所示:
loss ( x , class ) = − log ( exp ( x [ class ] ) ∑ j exp ( x [ j ] ) ) = − x [ class ] + log ( ∑ j exp ( x [ j ] ) ) \operatorname{loss}(x, \text {class})=-\log \left(\frac{\exp (x[\text {class}])}{\sum_{j} \exp (x[j])}\right)=-x[\text {class}]+\log \left(\sum_{j} \exp (x[j])\right) l o s s ( x , class ) = − log ( ∑ j exp ( x [ j ] ) exp ( x [ class ] ) ) = − x [ class ] + log ( ∑ j exp ( x [ j ] ) )
(由于交叉熵中的p是标签,其实就是正确类别为1,错误类别为0,所以交叉熵就变成了上面这个样子。其实就是对正确类别的分数取了softmax概率值再求了信息量,概率越接近1,信息量越小)
如果定义了weight,loss的计算公式如下所示:
loss ( x , class ) = \operatorname{loss}(x, \text { class })= l o s s ( x , class ) = weight[ class ] ( − x [ class ] + log ( ∑ j exp ( x [ j ] ) ) ) [\text { class }]\left(-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right)\right) [ class ] ( − x [ class ] + log ( ∑ j exp ( x [ j ] ) ) )
交叉熵 = 信息熵 + 相对熵
交叉熵:H ( P , Q ) = − ∑ i = 1 N P ( x i ) log Q ( x i ) \mathrm{H}(\boldsymbol{P}, \boldsymbol{Q})=-\sum_{i=1}^{N} \boldsymbol{P}\left(\boldsymbol{x}_{i}\right) \log \boldsymbol{Q}\left(\boldsymbol{x}_{i}\right) H ( P , Q ) = − ∑ i = 1 N P ( x i ) log Q ( x i )
自信息:I ( x ) = − log [ p ( x ) ] \mathrm{I}(x)=-\log [\boldsymbol{p}(\boldsymbol{x})] I ( x ) = − log [ p ( x ) ]
熵:H ( P ) = E x ∼ p [ I ( x ) ] = − ∑ i N P ( x i ) log P ( x i ) \mathrm{H}(\mathrm{P})=E_{x \sim p}[I(x)]=-\sum_{i}^{N} P\left(x_{i}\right) \log P\left(x_{i}\right) H ( P ) = E x ∼ p [ I ( x ) ] = − ∑ i N P ( x i ) log P ( x i )
相对熵:
D K L ( P , Q ) = E x ∼ p [ log P ( x ) Q ( x ) ] = E x ∼ p [ log P ( x ) − log Q ( x ) ] = ∑ i = 1 N P ( x i ) [ log P ( x i ) − log Q ( x i ) ] = ∑ i = 1 N P ( x i ) log P ( x i ) − ∑ i = 1 N P ( x i ) log Q ( x i ) = H ( P , Q ) − H ( P ) \begin{aligned} \boldsymbol{D}_{K L}(\boldsymbol{P}, \boldsymbol{Q}) &=\boldsymbol{E}_{x \sim p}\left[\log \frac{P(x)}{Q(x)}\right] \\ &=E_{x \sim p}[\log P(x)-\log Q(x)] \\ &=\sum_{i=1}^{N} P\left(x_{i}\right)\left[\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right] \\ &=\sum_{i=1}^{N} P\left(x_{i}\right) \log P\left(x_{i}\right)-\sum_{i=1}^{N} P\left(x_{i}\right) \log Q\left(x_{i}\right) \\ &=\boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q})-\boldsymbol{H}(\mathrm{P}) \end{aligned} D K L ( P , Q ) = E x ∼ p [ log Q ( x ) P ( x ) ] = E x ∼ p [ log P ( x ) − log Q ( x ) ] = i = 1 ∑ N P ( x i ) [ log P ( x i ) − log Q ( x i ) ] = i = 1 ∑ N P ( x i ) log P ( x i ) − i = 1 ∑ N P ( x i ) log Q ( x i ) = H ( P , Q ) − H ( P )
交叉熵:H ( P , Q ) = D K L ( P , Q ) + H ( P ) \mathrm{H}(\boldsymbol{P}, \boldsymbol{Q})=\boldsymbol{D}_{K L}(\boldsymbol{P}, \boldsymbol{Q})+\mathrm{H}(\boldsymbol{P}) H ( P , Q ) = D K L ( P , Q ) + H ( P )
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as npinputs = torch.tensor([[1 , 2 ], [1 , 3 ], [1 , 3 ]], dtype=torch.float ) target = torch.tensor([0 , 1 , 1 ], dtype=torch.long) loss_f_none = nn.CrossEntropyLoss(weight=None , reduction='none' ) loss_f_sum = nn.CrossEntropyLoss(weight=None , reduction='sum' ) loss_f_mean = nn.CrossEntropyLoss(weight=None , reduction='mean' ) loss_none = loss_f_none(inputs, target) loss_sum = loss_f_sum(inputs, target) loss_mean = loss_f_mean(inputs, target) print ("Cross Entropy Loss:\n " , loss_none, loss_sum, loss_mean)
1 2 Cross Entropy Loss: tensor([1.3133 , 0.1269 , 0.1269 ]) tensor(1.5671 ) tensor(0.5224 )
我们手动按公式计算一下:
loss_none:
对于[1, 2]:
首先转化为softmax值:[ e e + e 2 , e 2 e + e 2 ] \left[ \frac{e}{e+e^2}, \frac{e^2}{e+e^2} \right] [ e + e 2 e , e + e 2 e 2 ]
计算信息量:[ − log e ( e e + e 2 ) , − log e ( e 2 e + e 2 ) ] = [ 1.31326 , 0.313262 ] \left[ -\log _e\left( \frac{e}{e+e^2} \right) , -\log _e\left( \frac{e^2}{e+e^2} \right) \right] \,\,=\left[ 1.31326, 0.313262 \right] [ − log e ( e + e 2 e ) , − log e ( e + e 2 e 2 ) ] = [ 1 . 3 1 3 2 6 , 0 . 3 1 3 2 6 2 ]
由于类别是0,所以输出为1.31326
对于[1, 3]:输出为− log e ( e 3 e + e 3 ) = 0.126928 -\log _e\left( \frac{e^3}{e+e^3} \right) =0.126928 − log e ( e + e 3 e 3 ) = 0 . 1 2 6 9 2 8
loss_sum:1.31326 + 0.126928 + 0.126928 = 1.56712 1.31326+0.126928+0.126928=1.56712 1 . 3 1 3 2 6 + 0 . 1 2 6 9 2 8 + 0 . 1 2 6 9 2 8 = 1 . 5 6 7 1 2
loss_mean:1.31326 + 0.126928 + 0.126928 3 = 0.522372 \frac{1.31326+0.126928+0.126928}{3}=0.522372 3 1 . 3 1 3 2 6 + 0 . 1 2 6 9 2 8 + 0 . 1 2 6 9 2 8 = 0 . 5 2 2 3 7 2
再看看weight的功能:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as npinputs = torch.tensor([[1 , 2 ], [1 , 3 ], [1 , 3 ]], dtype=torch.float ) target = torch.tensor([0 , 1 , 1 ], dtype=torch.long) weights = torch.tensor([1 , 2 ], dtype=torch.float ) loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none' ) loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum' ) loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean' ) loss_none_w = loss_f_none_w(inputs, target) loss_sum = loss_f_sum(inputs, target) loss_mean = loss_f_mean(inputs, target) print ("\nweights: " , weights)print (loss_none_w, loss_sum, loss_mean)
1 2 weights: tensor([1. , 2. ]) tensor([1.3133 , 0.2539 , 0.2539 ]) tensor(1.8210 ) tensor(0.3642 )
可以看到类别1的输出乘了2
2.2 nn.NLLLoss
1 2 3 4 5 nn.NLLLoss(weight=None , size_average=None , ignore_index=-100 , reduce=None , reduction='mean' )
功能:实现负对数似然函数中的负号功能(挂羊头卖狗肉,NLL=negative log likelihood,只有negative,没有log likelihood)
主要参数:
weight:各类别的loss设置权值
ignore_index:忽略某个类别
reduction :计算模式,可为none/sum/mean
none-逐个元素计算
sum-所有元素求和,返回标量
mean-加权平均,返回标量
如果最后一层是softmax(softmax居然在转化为概率之后还加了一个对数,为什么pytorch总是有多余的动作?),那么用上这个nn.NLLLoss就构成了nn.CrossEntropyLoss的效果。当然,如果你最后一层不想用softmax,那么可以直接用nn.CrossEntropyLoss。
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as npinputs = torch.tensor([[1 , 2 ], [1 , 3 ], [1 , 3 ]], dtype=torch.float ) target = torch.tensor([0 , 1 , 1 ], dtype=torch.long) loss_f_none = nn.NLLLoss(reduction='none' ) loss_f_sum = nn.NLLLoss(reduction='sum' ) loss_f_mean = nn.NLLLoss(reduction='mean' ) loss_none = loss_f_none_w(inputs, target) loss_sum = loss_f_sum(inputs, target) loss_mean = loss_f_mean(inputs, target) print ("NLL Loss" , loss_none, loss_sum, loss_mean)
1 NLL Loss tensor([-1. , -3. , -3. ]) tensor(-7. ) tensor(-2.3333 )
其实就是把相应类别的输入取了一个负号。
2.3 nn.BCELoss
1 2 3 4 nn.BCELoss(weight=None , size_average=None , reduce=None , reduction='mean’)
功能:二分类交叉熵(Binary Cross Entropy)
注意事项:输入值取值在[0,1]
主要参数:
weight:各类别的loss设置权值
ignore_index:忽略某个类别
reduction :计算模式,可为none/sum/mean
none-逐个元素计算
sum-所有元素求和,返回标量
mean-加权平均,返回标量
计算公式如下(含weight):
l n = − w n [ y n ⋅ log x n + ( 1 − y n ) ⋅ log ( 1 − x n ) ] l_{n}=-w_{n}\left[y_{n} \cdot \log x_{n}+\left(1-y_{n}\right) \cdot \log \left(1-x_{n}\right)\right] l n = − w n [ y n ⋅ log x n + ( 1 − y n ) ⋅ log ( 1 − x n ) ]
另外BCELoss的forward函数接受的标签的格式也不一样,这里接受的是onehot标签,而且数据类型是torch.float。
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as npinputs = torch.tensor([[0.1 , 0.2 ], [0.2 , 0.2 ], [0.3 , 0.4 ], [0.4 , 0.5 ]], dtype=torch.float ) target_bce = torch.tensor([[1 , 0 ], [1 , 0 ], [0 , 1 ], [0 , 1 ]], dtype=torch.float ) loss_f_none = nn.BCELoss(reduction='none' ) loss_f_sum = nn.BCELoss(reduction='sum' ) loss_f_mean = nn.BCELoss(reduction='mean' ) loss_none = loss_f_none(inputs, target_bce) loss_sum = loss_f_sum(inputs, target_bce) loss_mean = loss_f_mean(inputs, target_bce) print ("BCE Loss" , loss_none, loss_sum, loss_mean)
1 2 3 4 BCE Loss tensor([[2.3026 , 0.2231 ], [1.6094 , 0.2231 ], [0.3567 , 0.9163 ], [0.5108 , 0.6931 ]]) tensor(6.8352 ) tensor(0.8544 )
验证:
[0.1, 0.2]:[ − log e ( 0.1 ) = 2.30259 , − log e ( 0.8 ) = 0.223144 ] \left[ -\log _e\left( 0.1 \right) =2.30259, -\log _e\left( 0.8 \right) =0.223144 \right] [ − log e ( 0 . 1 ) = 2 . 3 0 2 5 9 , − log e ( 0 . 8 ) = 0 . 2 2 3 1 4 4 ]
[0.2, 0.2]:[ − log e ( 0.2 ) = 1.60944 , − log e ( 0.8 ) = 0.223144 ] \left[ -\log _e\left( 0.2 \right) =1.60944, -\log _e\left( 0.8 \right) =0.223144 \right] [ − log e ( 0 . 2 ) = 1 . 6 0 9 4 4 , − log e ( 0 . 8 ) = 0 . 2 2 3 1 4 4 ]
[0.3, 0.4]:[ − log e ( 0.7 ) = 0.356675 , − log e ( 0.4 ) = 0.916291 ] \left[ -\log _e\left( 0.7 \right) =0.356675, -\log _e\left( 0.4 \right) =0.916291 \right] [ − log e ( 0 . 7 ) = 0 . 3 5 6 6 7 5 , − log e ( 0 . 4 ) = 0 . 9 1 6 2 9 1 ]
[0.4, 0.5]:[ − log e ( 0.6 ) = 0.510826 , − log e ( 0.5 ) = 0.693147 ] \left[ -\log _e\left( 0.6 \right) =0.510826, -\log _e\left( 0.5 \right) =0.693147 \right] [ − log e ( 0 . 6 ) = 0 . 5 1 0 8 2 6 , − log e ( 0 . 5 ) = 0 . 6 9 3 1 4 7 ]
二分类中,只有一个是正确的,所以标签是[0, 1](或[1, 0]),则若input越接近[0, 1](或[1, 0]),那么loss=[loss1, loss2]应该越接近[0, 0]。所以,可以将他们加起来得到loss_sum,或者求平均值得到loss_mean。
nn.BCELoss尽量把正确的score提高到1,把错误的score降到0。
对比nn.CrossEntropyLoss,由于softmax的使用,其实也是这样的。如果对[0.1, 0.2]用CrossEntropyLoss,输出为− log e ( e 0.1 e 0.1 + e 0.2 ) = 0.744397 -\log _e\left( \frac{e^{0.1}}{e^{0.1}+e^{0.2}} \right) =0.744397 − log e ( e 0 . 1 + e 0 . 2 e 0 . 1 ) = 0 . 7 4 4 3 9 7 ,要输出尽量小,则e 0.1 e 0.1 + e 0.2 \frac{e^{0.1}}{e^{0.1}+e^{0.2}} e 0 . 1 + e 0 . 2 e 0 . 1 应尽量接近1,那么就是0.1要变大,0.2要变小,也是增加正确的,压制错误的。
2.4 nn.BCEWithLogitsLoss
1 2 3 4 5 nn.BCEWithLogitsLoss(weight=None , size_average=None , reduce=None , reduction='mean' , pos_weight=None )
功能:结合Sigmoid与二分类交叉熵
主要参数:
pos_weight :正样本的权值
weight:各类别的loss设置权值
ignore_index:忽略某个类别
reduction :计算模式,可为none/sum/mean
none-逐个元素计算
sum-所有元素求和,返回标量
mean-加权平均,返回标量
计算公式:
l n = − w n [ y n ⋅ log σ ( x n ) + ( 1 − y n ) ⋅ log ( 1 − σ ( x n ) ) ] l_{n}=-w_{n}\left[y_{n} \cdot \log \sigma\left(x_{n}\right)+\left(1-y_{n}\right) \cdot \log \left(1-\sigma\left(x_{n}\right)\right)\right] l n = − w n [ y n ⋅ log σ ( x n ) + ( 1 − y n ) ⋅ log ( 1 − σ ( x n ) ) ]
就是比nn.BCELoss多了一个sigmoid的激活;参数多了一个pos_weight ,在正负样本数量不平衡时,可以用这个权重抵消这种差异带来的影响(比如正样本只有100个而负样本有300个,可以设置pos_weight =3)。
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as npinputs = torch.tensor([[1 , 2 ], [2 , 2 ], [3 , 4 ], [4 , 5 ]], dtype=torch.float ) target = torch.tensor([[1 , 0 ], [1 , 0 ], [0 , 1 ], [0 , 1 ]], dtype=torch.float ) target_bce = target inputs = torch.sigmoid(inputs) weights = torch.tensor([1 , 1 ], dtype=torch.float ) loss_f_none_w = nn.BCELoss(weight=weights, reduction='none' ) loss_f_sum = nn.BCELoss(weight=weights, reduction='sum' ) loss_f_mean = nn.BCELoss(weight=weights, reduction='mean' ) loss_none_w = loss_f_none_w(inputs, target_bce) loss_sum = loss_f_sum(inputs, target_bce) loss_mean = loss_f_mean(inputs, target_bce) print ("\nweights: " , weights)print ("BCE Loss" , loss_none_w, loss_sum, loss_mean)inputs = torch.tensor([[1 , 2 ], [2 , 2 ], [3 , 4 ], [4 , 5 ]], dtype=torch.float ) target_bce = torch.tensor([[1 , 0 ], [1 , 0 ], [0 , 1 ], [0 , 1 ]], dtype=torch.float ) weights = torch.tensor([1 , 1 ], dtype=torch.float ) loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none' ) loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum' ) loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean' ) loss_none_w = loss_f_none_w(inputs, target_bce) loss_sum = loss_f_sum(inputs, target_bce) loss_mean = loss_f_mean(inputs, target_bce) print ("\nweights: " , weights)print (loss_none_w, loss_sum, loss_mean)inputs = torch.tensor([[1 , 2 ], [2 , 2 ], [3 , 4 ], [4 , 5 ]], dtype=torch.float ) target_bce = torch.tensor([[1 , 0 ], [1 , 0 ], [0 , 1 ], [0 , 1 ]], dtype=torch.float ) weights = torch.tensor([1 ], dtype=torch.float ) pos_w = torch.tensor([3 ], dtype=torch.float ) loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none' , pos_weight=pos_w) loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum' , pos_weight=pos_w) loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean' , pos_weight=pos_w) loss_none_w = loss_f_none_w(inputs, target_bce) loss_sum = loss_f_sum(inputs, target_bce) loss_mean = loss_f_mean(inputs, target_bce) print ("\npos_weights: " , pos_w)print (loss_none_w, loss_sum, loss_mean)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 weights: tensor([1. , 1. ]) BCE Loss tensor([[0.3133 , 2.1269 ], [0.1269 , 2.1269 ], [3.0486 , 0.0181 ], [4.0181 , 0.0067 ]]) tensor(11.7856 ) tensor(1.4732 ) weights: tensor([1. , 1. ]) tensor([[0.3133 , 2.1269 ], [0.1269 , 2.1269 ], [3.0486 , 0.0181 ], [4.0181 , 0.0067 ]]) tensor(11.7856 ) tensor(1.4732 ) pos_weights: tensor([3. ]) tensor([[0.9398 , 2.1269 ], [0.3808 , 2.1269 ], [3.0486 , 0.0544 ], [4.0181 , 0.0201 ]]) tensor(12.7158 ) tensor(1.5895 )
2.5 nn.L1Loss
1 2 3 nn.L1Loss(size_average=None , reduce=None , reduction='mean’)
功能: 计算inputs与target之差的绝对值
主要参数:
reduction :
none- 逐个元素计算
sum- 所有元素求和,返回标量
mean- 加权平均,返回标量
计算公式:l n = ∣ x n − y n ∣ l_{n}=\left|x_{n}-y_{n}\right| l n = ∣ x n − y n ∣
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 import torchimport torch.nn as nnimport torch.nn.functional as Fimport matplotlib.pyplot as pltimport numpy as npfrom tools.common_tools import set_seedset_seed(1 ) inputs = torch.ones((2 , 2 )) target = torch.ones((2 , 2 )) * 3 loss_f = nn.L1Loss(reduction='none' ) loss = loss_f(inputs, target)
1 2 3 4 5 6 input :tensor([[1. , 1. ], [1. , 1. ]]) target:tensor([[3. , 3. ], [3. , 3. ]]) L1 loss:tensor([[2. , 2. ], [2. , 2. ]])
2.6 nn.MSELoss
1 2 3 nn.MSELoss(size_average=None , reduce=None , reduction='mean’)
功能: 计算inputs与target之差的平方
主要参数:
reduction :
none- 逐个元素计算
sum- 所有元素求和,返回标量
mean- 加权平均,返回标量
计算公式:l n = ( x n − y n ) 2 l_{n}=\left(x_{n}-y_{n}\right)^{2} l n = ( x n − y n ) 2
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import torchimport torch.nn as nnimport torch.nn.functional as Fimport matplotlib.pyplot as pltimport numpy as npfrom tools.common_tools import set_seedset_seed(1 ) inputs = torch.ones((2 , 2 )) target = torch.ones((2 , 2 )) * 3 loss_f_mse = nn.MSELoss(reduction='none' ) loss_mse = loss_f_mse(inputs, target) print ("input:{}\ntarget:{}\nMSE loss:{}" .format (inputs, target, loss_mse))
1 2 3 4 5 6 input :tensor([[1. , 1. ], [1. , 1. ]]) target:tensor([[3. , 3. ], [3. , 3. ]]) MSE loss:tensor([[4. , 4. ], [4. , 4. ]])
2.7 nn.SmoothL1Loss
1 2 3 nn.SmoothL1Loss(size_average=None , reduce=None , reduction='mean’)
功能: 平滑的L1损失函数
loss ( x , y ) = 1 n ∑ i z i \operatorname{loss}(x, y)=\frac{1}{n} \sum_{i} z_{i} l o s s ( x , y ) = n 1 ∑ i z i
z i = { 0.5 ( x i − y i ) 2 , if ∣ x i − y i ∣ < 1 ∣ x i − y i ∣ − 0.5 , otherwise z_{i}=\left\{\begin{array}{ll}0.5\left(x_{i}-y_{i}\right)^{2}, & \text { if }\left|x_{i}-y_{i}\right|<1 \\ \left|x_{i}-y_{i}\right|-0.5, & \text { otherwise }\end{array}\right. z i = { 0 . 5 ( x i − y i ) 2 , ∣ x i − y i ∣ − 0 . 5 , if ∣ x i − y i ∣ < 1 otherwise
默认reduction='mean’
2.8 nn.PoissonNLLLoss
1 2 3 4 5 6 nn.PoissonNLLLoss(log_input=True , full=False , size_average=None , eps=1e-08 , reduce=None , reduction='mean' )
功能:泊松分布的负对数似然损失函数
主要参数:
log_input :输入是否为对数形式,决定计算公式
log_input = True时
loss(input, target) = exp(input) - target * input
log_input = False时
loss(input, target) = input - target * log(input+eps)
full :计算所有loss,默认为False
eps :修正项,避免log(input)为nan
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 import torchimport torch.nn as nnimport torch.nn.functional as Fimport matplotlib.pyplot as pltimport numpy as npfrom tools.common_tools import set_seedset_seed(1 ) inputs = torch.randn((2 , 2 )) target = torch.randn((2 , 2 )) loss_f = nn.PoissonNLLLoss(log_input=True , full=False , reduction='none' ) loss = loss_f(inputs, target) print ("input:{}\ntarget:{}\nPoisson NLL loss:{}" .format (inputs, target, loss))loss_1 = torch.exp(inputs) - target*inputs print ("公式结果:" , loss_1)
1 2 3 4 5 6 7 8 input :tensor([[0.6614 , 0.2669 ], [0.0617 , 0.6213 ]]) target:tensor([[-0.4519 , -0.1661 ], [-1.5228 , 0.3817 ]]) Poisson NLL loss:tensor([[2.2363 , 1.3503 ], [1.1575 , 1.6242 ]]) 公式结果: tensor([[2.2363 , 1.3503 ], [1.1575 , 1.6242 ]])
可见结果是一样的。
2.9 nn.KLDivLoss
1 2 3 nn.KLDivLoss(size_average=None , reduce=None , reduction='mean' )
功能:计算KLD(divergence),KL散度,相对熵
注意事项:需提前将输入计算 log-probabilities,如通nn.logsoftmax()
主要参数:
reduction :none/sum/mean/batchmean
batchmean- batchsize维度求平均值
none- 逐个元素计算
sum- 所有元素求和,返回标量
mean- 加权平均,返回标量
KL散度的公式:
D K L ( P ∥ Q ) = E x ∼ p [ log P ( x ) Q ( x ) ] = E x ∼ p [ log P ( x ) − log Q ( x ) ] = ∑ i = 1 N P ( x i ) ( log P ( x i ) − log Q ( x i ) ) \begin{aligned} D_{K L}(P \| Q)=E_{x \sim p}\left[\log \frac{P(x)}{Q(x)}\right]&=E_{x \sim p}[\log P(x)-\log Q(x)] \\ &=\sum_{i=1}^{N} P\left(x_{i}\right)\left(\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right) \end{aligned} D K L ( P ∥ Q ) = E x ∼ p [ log Q ( x ) P ( x ) ] = E x ∼ p [ log P ( x ) − log Q ( x ) ] = i = 1 ∑ N P ( x i ) ( log P ( x i ) − log Q ( x i ) )
其中P ( x i ) P(x_i) P ( x i ) 是标签
函数实现的真正公式:
l n = y n ⋅ ( log y n − x n ) l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right) l n = y n ⋅ ( log y n − x n )
当y n = 0 y_n=0 y n = 0 时,l n = 0 l_n=0 l n = 0 ;当y n = 1 y_n=1 y n = 1 时,l n = − x n l_n=-x_n l n = − x n
并且发现这里的输入x n x_n x n 没有取对数,这是因为有些层的函数自带log-probabilities,比如说softmax层,所以这里为了方便就省去了。
当reduction='batchmean’就实现了KL散度的公式(还除多了一个N),带一个均值功能。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import torchimport torch.nn as nnimport torch.nn.functional as Fimport matplotlib.pyplot as pltimport numpy as npfrom tools.common_tools import set_seedset_seed(1 ) inputs = torch.tensor([[0.5 , 0.3 , 0.2 ], [0.2 , 0.3 , 0.5 ]]) inputs_log = torch.log(inputs) target = torch.tensor([[0.9 , 0.05 , 0.05 ], [0.1 , 0.7 , 0.2 ]], dtype=torch.float ) loss_f_none = nn.KLDivLoss(reduction='none' ) loss_f_mean = nn.KLDivLoss(reduction='mean' ) loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean' ) loss_none = loss_f_none(inputs, target) loss_mean = loss_f_mean(inputs, target) loss_bs_mean = loss_f_bs_mean(inputs, target) print ("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}" .format (loss_none, loss_mean, loss_bs_mean))loss_1 = target * (torch.log(target) - inputs) print ("公式结果:" , loss_1)
1 2 3 4 5 6 7 8 9 10 loss_none: tensor([[-0.5448 , -0.1648 , -0.1598 ], [-0.2503 , -0.4597 , -0.4219 ]]) loss_mean: -0.3335360586643219 loss_bs_mean: -1.000608205795288 公式结果: tensor([[-0.5448 , -0.1648 , -0.1598 ], [-0.2503 , -0.4597 , -0.4219 ]])
可见,reduction='mean’是除以6,即所有数据的数量;reduction='batchmean’是除以2,即所有batch的数量
2.10 nn.MarginRankingLoss
1 2 3 4 nn.MarginRankingLoss(margin=0.0 , size_average=None , reduce=None , reduction='mean' )
功能:计算两个向量之间的相似度,用于排序任务
特别说明:该方法计算两组数据之间的差异,返回一个n*n的 loss 矩阵
主要参数:
margin :边界值,x1与x2之间的差异值
reduction :计算模式,可为none/sum/mean
y = 1时, 希望x1比x2大,当x1>x2时,不产生loss
y = -1时,希望x2比x1大,当x2>x1时,不产生loss
计算公式:
loss ( x , y ) = max ( 0 , − y ∗ ( x 1 − x 2 ) + margin ) \operatorname{loss}(x, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin}) l o s s ( x , y ) = max ( 0 , − y ∗ ( x 1 − x 2 ) + m a r g i n )
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 import torchimport torch.nn as nnimport torch.nn.functional as Fimport matplotlib.pyplot as pltimport numpy as npfrom tools.common_tools import set_seedset_seed(1 ) x1 = torch.tensor([[1 ], [2 ], [3 ]], dtype=torch.float ) x2 = torch.tensor([[3 ], [2 ], [2 ]], dtype=torch.float ) target = torch.tensor([1 ], dtype=torch.float ) loss_f_none = nn.MarginRankingLoss(margin=0 , reduction='none' ) loss = loss_f_none(x1, x2, target) print (loss)
1 2 3 tensor([[2. ], [0. ], [0. ]])
y=1,则loss的计算是m a x ( 0 , x 2 − x 1 + m a r g i n ) max(0, x_2 - x_1 + margin) m a x ( 0 , x 2 − x 1 + m a r g i n )
1 2 3 4 5 6 7 8 9 10 x1 = torch.tensor([[1 ], [2 ], [3 ]], dtype=torch.float ) x2 = torch.tensor([[3 ], [2 ], [2 ]], dtype=torch.float ) target = torch.tensor([1 , 1 , -1 ], dtype=torch.float ) loss_f_none = nn.MarginRankingLoss(margin=0 , reduction='none' ) loss = loss_f_none(x1, x2, target) print (loss)
1 2 3 tensor([[2. , 2. , 0. ], [0. , 0. , 0. ], [0. , 0. , 1. ]])
第一列第二列的计算都是:m a x ( 0 , x 2 − x 1 + m a r g i n ) max(0, x_2 - x_1 + margin) m a x ( 0 , x 2 − x 1 + m a r g i n )
第三列的计算是:m a x ( 0 , x 1 − x 2 + m a r g i n ) max(0, x_1 - x_2 + margin) m a x ( 0 , x 1 − x 2 + m a r g i n )
2.11 nn.MultiLabelMarginLoss
1 2 3 nn.MultiLabelMarginLoss(size_average=None , reduce=None , reduction='mean' )
功能:多标签边界损失函数(一个输入可能属于多个类别)
举例:四分类任务,样本x属于0类和3类,
标签:[0, 3, -1, -1](表示这个输入属于第0和第3类,-1是用来填充的,
保存label和input的size一样)
主要参数:
reduction :计算模式,可为none/sum/mean
计算公式:loss ( x , y ) = ∑ i j max ( 0 , 1 − ( x [ y [ j ] ] − x [ i ] ) ) x . size ( 0 ) \operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-(x[y[j]]-x[i]))}{\mathrm{x} . \operatorname{size}(0)} l o s s ( x , y ) = ∑ i j x . s i z e ( 0 ) max ( 0 , 1 − ( x [ y [ j ] ] − x [ i ] ) )
x [ y [ j ] ] − x [ i ] x[y[j]]-x[i] x [ y [ j ] ] − x [ i ] 的意思是:属于类别的input值减去不属于类别的input值
实验:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import torchimport torch.nn as nnimport torch.nn.functional as Fimport matplotlib.pyplot as pltimport numpy as npfrom tools.common_tools import set_seedset_seed(1 ) x = torch.tensor([[0.1 , 0.2 , 0.4 , 0.8 ]]) y = torch.tensor([[0 , 3 , -1 , -1 ]], dtype=torch.long) loss_f = nn.MultiLabelMarginLoss(reduction='none' ) loss = loss_f(x, y) print (loss)
上例中x [ y [ j ] ] − x [ i ] = [ 0.1 − 0.2 , 0.1 − 0.4 , 0.8 − 0.2 , 0.8 − 0.4 ] = [ − 0.1 , − 0.3 , 0.6 , 0.4 ] x\left[ y\left[ j \right] \right] -x\left[ i \right] =\left[ 0.1-0.2, 0.1-0.4, 0.8-0.2, 0.8-0.4 \right] =\left[ -0.1, -0.3, 0.6, 0.4 \right] x [ y [ j ] ] − x [ i ] = [ 0 . 1 − 0 . 2 , 0 . 1 − 0 . 4 , 0 . 8 − 0 . 2 , 0 . 8 − 0 . 4 ] = [ − 0 . 1 , − 0 . 3 , 0 . 6 , 0 . 4 ]
所以l o s s = 1.1 + 1.3 + 0.4 + 0.6 4 = 0.85 loss=\frac{1.1+1.3+0.4+0.6}{4}=0.85 l o s s = 4 1 . 1 + 1 . 3 + 0 . 4 + 0 . 6 = 0 . 8 5
2.12 nn.SoftMarginLoss
1 2 3 nn.SoftMarginLoss(size_average=None , reduce=None , reduction='mean' )
功能:计算二分类的logistic损失
主要参数:
reduction :计算模式,可为none/sum/mean
计算公式:loss ( x , y ) = ∑ i log ( 1 + exp ( − y [ i ] ∗ x [ i ] ) ) x.nelement ( ) \operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] * x[i]))}{\text { x.nelement }()} l o s s ( x , y ) = ∑ i x.nelement ( ) log ( 1 + exp ( − y [ i ] ∗ x [ i ] ) )
2.13 nn.MultiLabelSoftMarginLoss
1 2 3 4 nn.MultiLabelSoftMarginLoss(weight=None , size_average=None , reduce=None , reduction='mean' )
功能:SoftMarginLoss多标签版本
主要参数:
weight:各类别的loss设置权值
reduction :计算模式,可为none/sum/mean
计算公式:
loss ( x , y ) = − 1 C ∗ ∑ i y [ i ] ∗ log ( ( 1 + exp ( − x [ i ] ) ) − 1 ) + ( 1 − y [ i ] ) ∗ log ( exp ( − x [ i ] ) ( 1 + exp ( − x [ i ] ) ) ) \operatorname{loss} (x, y)=-\frac{1}{C} * \sum_{i} y[i] * \log \left((1+\exp (-x[i]))^{-1}\right)+(1-y[i]) * \log \left(\frac{\exp (-x[i])}{(1+\exp (-x[i]))}\right) l o s s ( x , y ) = − C 1 ∗ ∑ i y [ i ] ∗ log ( ( 1 + exp ( − x [ i ] ) ) − 1 ) + ( 1 − y [ i ] ) ∗ log ( ( 1 + exp ( − x [ i ] ) ) exp ( − x [ i ] ) )
2.14 nn.MultiMarginLoss
1 2 3 4 5 nn.MultiMarginLoss(p=1 , margin=1.0 , weight=None , size_average=None , reduce=None , reduction='mean' )
功能:计算多分类的折页损失
主要参数:
p :可选1或2
weight:各类别的loss设置权值
margin :边界值
reduction :计算模式,可为none/sum/mean
计算公式:
loss ( x , y ) = ∑ i max ( 0 , margin − x [ y ] + x [ i ] ) ) p x . size ( 0 ) \operatorname{loss}(x, y)=\frac{\left.\sum_{i} \max (0, \operatorname{margin}-x[y]+x[i])\right)^{p}}{\mathrm{x} . \operatorname{size}(0)} l o s s ( x , y ) = x . s i z e ( 0 ) ∑ i max ( 0 , m a r g i n − x [ y ] + x [ i ] ) ) p
2.15 nn.TripletMarginLoss
1 2 3 4 5 6 7 nn.TripletMarginLoss(margin=1.0 , p=2.0 , eps=1e-06 , swap=False , size_average=None , reduce=None , reduction='mean' )
功能:计算三元组损失,人脸验证中常用
主要参数:
p :范数的阶,默认为2
margin :边界值
reduction :计算模式,可为none/sum/mean
计算公式:
L ( a , p , n ) = max { d ( a i , p i ) − d ( a i , n i ) + margin , 0 } L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\operatorname{margin}, 0\right\} L ( a , p , n ) = max { d ( a i , p i ) − d ( a i , n i ) + m a r g i n , 0 }
d ( x i , y i ) = ∥ x i − y i ∥ p d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{p} d ( x i , y i ) = ∥ x i − y i ∥ p
2.16 nn.HingeEmbeddingLoss
1 2 3 4 nn.HingeEmbeddingLoss(margin=1.0 , size_average=None , reduce=None , reduction='mean’)
功能:计算两个输入的相似性,常用于非线性embedding和半监督学习
特别注意:输入x应为两个输入之差的绝对值
主要参数:
margin :边界值
reduction :计算模式,可为none/sum/mean
计算公式:
l n = { x n , if y n = 1 max { 0 , Δ − x n } , if y n = − 1 l_{n}=\left\{\begin{array}{ll}x_{n}, & \text { if } y_{n}=1 \\ \max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1\end{array}\right. l n = { x n , max { 0 , Δ − x n } , if y n = 1 if y n = − 1
2.17 nn.CosineEmbeddingLoss
1 2 3 4 nn.CosineEmbeddingLoss(margin=0.0 , size_average=None , reduce=None , reduction='mean' )
功能:采用余弦相似度计算两个输入的相似性
主要参数:
margin :可取值[-1, 1] , 推荐为[0, 0.5]
reduction :计算模式,可为none/sum/mean
计算公式:
loss ( x , y ) = { 1 − cos ( x 1 , x 2 ) , if y = 1 max ( 0 , cos ( x 1 , x 2 ) − margin ) , if y = − 1 \operatorname{loss}(x, y)=\left\{\begin{array}{ll}1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\ \max \left(0, \cos \left(x_{1}, x_{2}\right)-\operatorname{margin}\right), & \text { if } y=-1\end{array}\right. l o s s ( x , y ) = { 1 − cos ( x 1 , x 2 ) , max ( 0 , cos ( x 1 , x 2 ) − m a r g i n ) , if y = 1 if y = − 1
cos ( θ ) = A ⋅ B ∥ A ∥ ∥ B ∥ = ∑ i = 1 n A i × B i ∑ i = 1 n ( A i ) 2 × ∑ i = 1 n ( B i ) 2 \cos (\theta)=\frac{A \cdot B}{\|A\|\|B\|}=\frac{\sum_{i=1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i=1}^{n}\left(A_{i}\right)^{2}} \times \sqrt{\sum_{i=1}^{n}\left(B_{i}\right)^{2}}} cos ( θ ) = ∥ A ∥ ∥ B ∥ A ⋅ B = ∑ i = 1 n ( A i ) 2 × ∑ i = 1 n ( B i ) 2 ∑ i = 1 n A i × B i
2.18 nn.CTCLoss
1 2 3 torch.nn.CTCLoss(blank=0 , reduction='mean' , zero_infinity=False )
功能: 计算CTC损失,解决时序类数据的分类(Connectionist Temporal Classification)
主要参数:
blank :blank label
zero_infinity :无穷大的值或梯度置0
reduction :计算模式,可为none/sum/mean