损失函数

1. 损失函数概念

损失函数:衡量模型输出与真实标签的差异

image-20200803105015854

损失函数(Loss Function):

Loss=f(y^,y)Loss=f\left( \hat{y},y \right)

代价函数(Cost Function):

Cost=1Ni=1Nf(y^i,y)Cost=\frac{1}{N}\sum_{i=1}^N{f\left( \hat{y}_i,y \right)}

目标函数(Objective Function):

Obj=Cost+RegularizationObj\,\,=\,\,Cost\,\,+\,\,Regularization

2. pytorch提供的各种损失函数

2.1 nn.CrossEntropyLoss

1
2
3
4
5
nn.CrossEntropyLoss(weight=None, 
size_average=None,
ignore_index=-100,
reduce=None,
reduction='mean')

功能: nn.LogSoftmax ()与nn.NLLLoss ()结合,进行交叉熵计算

主要参数:

  • weight:reduction='mean’时,各类别的loss设置权值(若求平均值,则weight=[1,…,1])
  • ignore_index:忽略某个类别
  • reduction :计算模式,可为none/sum/mean
    none- 逐个元素计算
    sum- 所有元素求和,返回标量
    mean- 加权平均,返回标量
  • size_average和reduce不用填,即将被抛弃的参数(下个版本就取消)

loss的计算公式如下所示:

loss(x,class)=log(exp(x[class])jexp(x[j]))=x[class]+log(jexp(x[j]))\operatorname{loss}(x, \text {class})=-\log \left(\frac{\exp (x[\text {class}])}{\sum_{j} \exp (x[j])}\right)=-x[\text {class}]+\log \left(\sum_{j} \exp (x[j])\right)

(由于交叉熵中的p是标签,其实就是正确类别为1,错误类别为0,所以交叉熵就变成了上面这个样子。其实就是对正确类别的分数取了softmax概率值再求了信息量,概率越接近1,信息量越小)

如果定义了weight,loss的计算公式如下所示:

loss(x, class )=\operatorname{loss}(x, \text { class })= weight[ class ](x[ class ]+log(jexp(x[j])))[\text { class }]\left(-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right)\right)

交叉熵 = 信息熵 + 相对熵

交叉熵:H(P,Q)=i=1NP(xi)logQ(xi)\mathrm{H}(\boldsymbol{P}, \boldsymbol{Q})=-\sum_{i=1}^{N} \boldsymbol{P}\left(\boldsymbol{x}_{i}\right) \log \boldsymbol{Q}\left(\boldsymbol{x}_{i}\right)

自信息:I(x)=log[p(x)]\mathrm{I}(x)=-\log [\boldsymbol{p}(\boldsymbol{x})]

熵:H(P)=Exp[I(x)]=iNP(xi)logP(xi)\mathrm{H}(\mathrm{P})=E_{x \sim p}[I(x)]=-\sum_{i}^{N} P\left(x_{i}\right) \log P\left(x_{i}\right)

相对熵:

DKL(P,Q)=Exp[logP(x)Q(x)]=Exp[logP(x)logQ(x)]=i=1NP(xi)[logP(xi)logQ(xi)]=i=1NP(xi)logP(xi)i=1NP(xi)logQ(xi)=H(P,Q)H(P)\begin{aligned} \boldsymbol{D}_{K L}(\boldsymbol{P}, \boldsymbol{Q}) &=\boldsymbol{E}_{x \sim p}\left[\log \frac{P(x)}{Q(x)}\right] \\ &=E_{x \sim p}[\log P(x)-\log Q(x)] \\ &=\sum_{i=1}^{N} P\left(x_{i}\right)\left[\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right] \\ &=\sum_{i=1}^{N} P\left(x_{i}\right) \log P\left(x_{i}\right)-\sum_{i=1}^{N} P\left(x_{i}\right) \log Q\left(x_{i}\right) \\ &=\boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q})-\boldsymbol{H}(\mathrm{P}) \end{aligned}

交叉熵:H(P,Q)=DKL(P,Q)+H(P)\mathrm{H}(\boldsymbol{P}, \boldsymbol{Q})=\boldsymbol{D}_{K L}(\boldsymbol{P}, \boldsymbol{Q})+\mathrm{H}(\boldsymbol{P})

实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# ----------------------------------- CrossEntropy loss: reduction -----------------------------------
# def loss function
loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

# forward
loss_none = loss_f_none(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)
1
2
Cross Entropy Loss:
tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

我们手动按公式计算一下:

  • loss_none:

    对于[1, 2]:

    1. 首先转化为softmax值:[ee+e2,e2e+e2]\left[ \frac{e}{e+e^2}, \frac{e^2}{e+e^2} \right]
    2. 计算信息量:[loge(ee+e2),loge(e2e+e2)]=[1.31326,0.313262]\left[ -\log _e\left( \frac{e}{e+e^2} \right) , -\log _e\left( \frac{e^2}{e+e^2} \right) \right] \,\,=\left[ 1.31326, 0.313262 \right]
    3. 由于类别是0,所以输出为1.31326

    对于[1, 3]:输出为loge(e3e+e3)=0.126928-\log _e\left( \frac{e^3}{e+e^3} \right) =0.126928

  • loss_sum:1.31326+0.126928+0.126928=1.567121.31326+0.126928+0.126928=1.56712

  • loss_mean:1.31326+0.126928+0.1269283=0.522372\frac{1.31326+0.126928+0.126928}{3}=0.522372

再看看weight的功能:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# def loss function
weights = torch.tensor([1, 2], dtype=torch.float)

loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)
1
2
weights:  tensor([1., 2.])
tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)

可以看到类别1的输出乘了2

2.2 nn.NLLLoss

1
2
3
4
5
nn.NLLLoss(weight=None,
size_average=None,
ignore_index=-100,
reduce=None,
reduction='mean')

功能:实现负对数似然函数中的负号功能(挂羊头卖狗肉,NLL=negative log likelihood,只有negative,没有log likelihood)

主要参数:

  • weight:各类别的loss设置权值
  • ignore_index:忽略某个类别
  • reduction :计算模式,可为none/sum/mean
    none-逐个元素计算
    sum-所有元素求和,返回标量
    mean-加权平均,返回标量

如果最后一层是softmax(softmax居然在转化为概率之后还加了一个对数,为什么pytorch总是有多余的动作?),那么用上这个nn.NLLLoss就构成了nn.CrossEntropyLoss的效果。当然,如果你最后一层不想用softmax,那么可以直接用nn.CrossEntropyLoss。

实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

loss_f_none = nn.NLLLoss(reduction='none')
loss_f_sum = nn.NLLLoss(reduction='sum')
loss_f_mean = nn.NLLLoss(reduction='mean')

# forward
loss_none = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("NLL Loss", loss_none, loss_sum, loss_mean)
1
NLL Loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

其实就是把相应类别的输入取了一个负号。

2.3 nn.BCELoss

1
2
3
4
nn.BCELoss(weight=None, 
size_average=None,
reduce=None,
reduction='mean’)

功能:二分类交叉熵(Binary Cross Entropy)

注意事项:输入值取值在[0,1]

主要参数:

  • weight:各类别的loss设置权值
  • ignore_index:忽略某个类别
  • reduction :计算模式,可为none/sum/mean
    none-逐个元素计算
    sum-所有元素求和,返回标量
    mean-加权平均,返回标量

计算公式如下(含weight):

ln=wn[ynlogxn+(1yn)log(1xn)]l_{n}=-w_{n}\left[y_{n} \cdot \log x_{n}+\left(1-y_{n}\right) \cdot \log \left(1-x_{n}\right)\right]

另外BCELoss的forward函数接受的标签的格式也不一样,这里接受的是onehot标签,而且数据类型是torch.float。

实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

inputs = torch.tensor([[0.1, 0.2], [0.2, 0.2], [0.3, 0.4], [0.4, 0.5]], dtype=torch.float)
target_bce = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

loss_f_none = nn.BCELoss(reduction='none')
loss_f_sum = nn.BCELoss(reduction='sum')
loss_f_mean = nn.BCELoss(reduction='mean')

# forward
loss_none = loss_f_none(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("BCE Loss", loss_none, loss_sum, loss_mean)
1
2
3
4
BCE Loss tensor([[2.3026, 0.2231],
[1.6094, 0.2231],
[0.3567, 0.9163],
[0.5108, 0.6931]]) tensor(6.8352) tensor(0.8544)

验证:

  1. [0.1, 0.2]:[loge(0.1)=2.30259,loge(0.8)=0.223144]\left[ -\log _e\left( 0.1 \right) =2.30259, -\log _e\left( 0.8 \right) =0.223144 \right]
  2. [0.2, 0.2]:[loge(0.2)=1.60944,loge(0.8)=0.223144]\left[ -\log _e\left( 0.2 \right) =1.60944, -\log _e\left( 0.8 \right) =0.223144 \right]
  3. [0.3, 0.4]:[loge(0.7)=0.356675,loge(0.4)=0.916291]\left[ -\log _e\left( 0.7 \right) =0.356675, -\log _e\left( 0.4 \right) =0.916291 \right]
  4. [0.4, 0.5]:[loge(0.6)=0.510826,loge(0.5)=0.693147]\left[ -\log _e\left( 0.6 \right) =0.510826, -\log _e\left( 0.5 \right) =0.693147 \right]

二分类中,只有一个是正确的,所以标签是[0, 1](或[1, 0]),则若input越接近[0, 1](或[1, 0]),那么loss=[loss1, loss2]应该越接近[0, 0]。所以,可以将他们加起来得到loss_sum,或者求平均值得到loss_mean。

nn.BCELoss尽量把正确的score提高到1,把错误的score降到0。

对比nn.CrossEntropyLoss,由于softmax的使用,其实也是这样的。如果对[0.1, 0.2]用CrossEntropyLoss,输出为loge(e0.1e0.1+e0.2)=0.744397-\log _e\left( \frac{e^{0.1}}{e^{0.1}+e^{0.2}} \right) =0.744397,要输出尽量小,则e0.1e0.1+e0.2\frac{e^{0.1}}{e^{0.1}+e^{0.2}}应尽量接近1,那么就是0.1要变大,0.2要变小,也是增加正确的,压制错误的。

2.4 nn.BCEWithLogitsLoss

1
2
3
4
5
nn.BCEWithLogitsLoss(weight=None, 
size_average=None,
reduce=None,
reduction='mean',
pos_weight=None)

功能:结合Sigmoid与二分类交叉熵

主要参数:

  • pos_weight :正样本的权值
  • weight:各类别的loss设置权值
  • ignore_index:忽略某个类别
  • reduction :计算模式,可为none/sum/mean
    none-逐个元素计算
    sum-所有元素求和,返回标量
    mean-加权平均,返回标量

计算公式:

ln=wn[ynlogσ(xn)+(1yn)log(1σ(xn))]l_{n}=-w_{n}\left[y_{n} \cdot \log \sigma\left(x_{n}\right)+\left(1-y_{n}\right) \cdot \log \left(1-\sigma\left(x_{n}\right)\right)\right]

就是比nn.BCELoss多了一个sigmoid的激活;参数多了一个pos_weight ,在正负样本数量不平衡时,可以用这个权重抵消这种差异带来的影响(比如正样本只有100个而负样本有300个,可以设置pos_weight =3)。

实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target
inputs = torch.sigmoid(inputs)

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')
loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\nweights: ", weights)
print("BCE Loss", loss_none_w, loss_sum, loss_mean)

# =======================我是分界线=====================

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target_bce = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none')
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)

# =======================我是分界线=====================

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target_bce = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

weights = torch.tensor([1], dtype=torch.float)
pos_w = torch.tensor([3], dtype=torch.float) # 3

loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w)
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w)
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\npos_weights: ", pos_w)
print(loss_none_w, loss_sum, loss_mean)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
weights:  tensor([1., 1.])
BCE Loss tensor([[0.3133, 2.1269],
[0.1269, 2.1269],
[3.0486, 0.0181],
[4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

weights: tensor([1., 1.])
tensor([[0.3133, 2.1269],
[0.1269, 2.1269],
[3.0486, 0.0181],
[4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

pos_weights: tensor([3.])
tensor([[0.9398, 2.1269],
[0.3808, 2.1269],
[3.0486, 0.0544],
[4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)

2.5 nn.L1Loss

1
2
3
nn.L1Loss(size_average=None, 
reduce=None,
reduction='mean’)

功能: 计算inputs与target之差的绝对值

主要参数:

  • reduction :
    none- 逐个元素计算
    sum- 所有元素求和,返回标量
    mean- 加权平均,返回标量

计算公式:ln=xnynl_{n}=\left|x_{n}-y_{n}\right|

实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1) # 设置随机种子

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f = nn.L1Loss(reduction='none')
loss = loss_f(inputs, target)
1
2
3
4
5
6
input:tensor([[1., 1.],
[1., 1.]])
target:tensor([[3., 3.],
[3., 3.]])
L1 loss:tensor([[2., 2.],
[2., 2.]])

2.6 nn.MSELoss

1
2
3
nn.MSELoss(size_average=None, 
reduce=None,
reduction='mean’)

功能: 计算inputs与target之差的平方

主要参数:

  • reduction :
    none- 逐个元素计算
    sum- 所有元素求和,返回标量
    mean- 加权平均,返回标量

计算公式:ln=(xnyn)2l_{n}=\left(x_{n}-y_{n}\right)^{2}

实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1) # 设置随机种子

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f_mse = nn.MSELoss(reduction='none')
loss_mse = loss_f_mse(inputs, target)

print("input:{}\ntarget:{}\nMSE loss:{}".format(inputs, target, loss_mse))
1
2
3
4
5
6
input:tensor([[1., 1.],
[1., 1.]])
target:tensor([[3., 3.],
[3., 3.]])
MSE loss:tensor([[4., 4.],
[4., 4.]])

2.7 nn.SmoothL1Loss

1
2
3
nn.SmoothL1Loss(size_average=None, 
reduce=None,
reduction='mean’)

功能: 平滑的L1损失函数

image-20200803204659551

loss(x,y)=1nizi\operatorname{loss}(x, y)=\frac{1}{n} \sum_{i} z_{i}

zi={0.5(xiyi)2, if xiyi<1xiyi0.5, otherwise z_{i}=\left\{\begin{array}{ll}0.5\left(x_{i}-y_{i}\right)^{2}, & \text { if }\left|x_{i}-y_{i}\right|<1 \\ \left|x_{i}-y_{i}\right|-0.5, & \text { otherwise }\end{array}\right.

默认reduction='mean’

2.8 nn.PoissonNLLLoss

1
2
3
4
5
6
nn.PoissonNLLLoss(log_input=True, 
full=False,
size_average=None,
eps=1e-08,
reduce=None,
reduction='mean')

功能:泊松分布的负对数似然损失函数

主要参数:

  • log_input :输入是否为对数形式,决定计算公式
    • log_input = True时
      loss(input, target) = exp(input) - target * input
    • log_input = False时
      loss(input, target) = input - target * log(input+eps)
  • full :计算所有loss,默认为False
  • eps :修正项,避免log(input)为nan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1) # 设置随机种子

inputs = torch.randn((2, 2))
target = torch.randn((2, 2))

loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
loss = loss_f(inputs, target)
print("input:{}\ntarget:{}\nPoisson NLL loss:{}".format(inputs, target, loss))

# --------------------------------- compute by hand

loss_1 = torch.exp(inputs) - target*inputs

print("公式结果:", loss_1)
1
2
3
4
5
6
7
8
input:tensor([[0.6614, 0.2669],
[0.0617, 0.6213]])
target:tensor([[-0.4519, -0.1661],
[-1.5228, 0.3817]])
Poisson NLL loss:tensor([[2.2363, 1.3503],
[1.1575, 1.6242]])
公式结果: tensor([[2.2363, 1.3503],
[1.1575, 1.6242]])

可见结果是一样的。

2.9 nn.KLDivLoss

1
2
3
nn.KLDivLoss(size_average=None, 
reduce=None,
reduction='mean')

功能:计算KLD(divergence),KL散度,相对熵

注意事项:需提前将输入计算 log-probabilities,如通nn.logsoftmax()

主要参数:

  • reduction :none/sum/mean/batchmean
    batchmean- batchsize维度求平均值
    none- 逐个元素计算
    sum- 所有元素求和,返回标量
    mean- 加权平均,返回标量

KL散度的公式:

DKL(PQ)=Exp[logP(x)Q(x)]=Exp[logP(x)logQ(x)]=i=1NP(xi)(logP(xi)logQ(xi))\begin{aligned} D_{K L}(P \| Q)=E_{x \sim p}\left[\log \frac{P(x)}{Q(x)}\right]&=E_{x \sim p}[\log P(x)-\log Q(x)] \\ &=\sum_{i=1}^{N} P\left(x_{i}\right)\left(\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right) \end{aligned}

其中P(xi)P(x_i)是标签

函数实现的真正公式:

ln=yn(logynxn)l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right)

yn=0y_n=0时,ln=0l_n=0;当yn=1y_n=1时,ln=xnl_n=-x_n

并且发现这里的输入xnx_n没有取对数,这是因为有些层的函数自带log-probabilities,比如说softmax层,所以这里为了方便就省去了。

当reduction='batchmean’就实现了KL散度的公式(还除多了一个N),带一个均值功能。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1) # 设置随机种子

inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])
inputs_log = torch.log(inputs)
target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

loss_f_none = nn.KLDivLoss(reduction='none')
loss_f_mean = nn.KLDivLoss(reduction='mean')
loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean')

loss_none = loss_f_none(inputs, target)
loss_mean = loss_f_mean(inputs, target)
loss_bs_mean = loss_f_bs_mean(inputs, target)

print("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}".format(loss_none, loss_mean, loss_bs_mean))

loss_1 = target * (torch.log(target) - inputs)
print("公式结果:", loss_1)
1
2
3
4
5
6
7
8
9
10
loss_none:
tensor([[-0.5448, -0.1648, -0.1598],
[-0.2503, -0.4597, -0.4219]])
loss_mean:
-0.3335360586643219
loss_bs_mean:
-1.000608205795288

公式结果: tensor([[-0.5448, -0.1648, -0.1598],
[-0.2503, -0.4597, -0.4219]])

可见,reduction='mean’是除以6,即所有数据的数量;reduction='batchmean’是除以2,即所有batch的数量

2.10 nn.MarginRankingLoss

1
2
3
4
nn.MarginRankingLoss(margin=0.0, 
size_average=None,
reduce=None,
reduction='mean')

功能:计算两个向量之间的相似度,用于排序任务

特别说明:该方法计算两组数据之间的差异,返回一个n*n的 loss 矩阵

主要参数:

  • margin :边界值,x1与x2之间的差异值
  • reduction :计算模式,可为none/sum/mean

y = 1时, 希望x1比x2大,当x1>x2时,不产生loss
y = -1时,希望x2比x1大,当x2>x1时,不产生loss

计算公式:

loss(x,y)=max(0,y(x1x2)+margin)\operatorname{loss}(x, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin})

实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1) # 设置随机种子

x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
x2 = torch.tensor([[3], [2], [2]], dtype=torch.float)

target = torch.tensor([1], dtype=torch.float)

loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')

loss = loss_f_none(x1, x2, target)

print(loss)
1
2
3
tensor([[2.],
[0.],
[0.]])

y=1,则loss的计算是max(0,x2x1+margin)max(0, x_2 - x_1 + margin)

1
2
3
4
5
6
7
8
9
10
x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
x2 = torch.tensor([[3], [2], [2]], dtype=torch.float)

target = torch.tensor([1, 1, -1], dtype=torch.float)

loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')

loss = loss_f_none(x1, x2, target)

print(loss)
1
2
3
tensor([[2., 2., 0.],
[0., 0., 0.],
[0., 0., 1.]])

第一列第二列的计算都是:max(0,x2x1+margin)max(0, x_2 - x_1 + margin)

第三列的计算是:max(0,x1x2+margin)max(0, x_1 - x_2 + margin)

2.11 nn.MultiLabelMarginLoss

1
2
3
nn.MultiLabelMarginLoss(size_average=None, 
reduce=None,
reduction='mean')

功能:多标签边界损失函数(一个输入可能属于多个类别)

举例:四分类任务,样本x属于0类和3类,

标签:[0, 3, -1, -1](表示这个输入属于第0和第3类,-1是用来填充的,

保存label和input的size一样)

主要参数:

  • reduction :计算模式,可为none/sum/mean

计算公式:loss(x,y)=ijmax(0,1(x[y[j]]x[i]))x.size(0)\operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-(x[y[j]]-x[i]))}{\mathrm{x} . \operatorname{size}(0)}

x[y[j]]x[i]x[y[j]]-x[i]的意思是:属于类别的input值减去不属于类别的input值

实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1) # 设置随机种子

x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

loss_f = nn.MultiLabelMarginLoss(reduction='none')

loss = loss_f(x, y)

print(loss)
1
tensor([0.8500])

上例中x[y[j]]x[i]=[0.10.2,0.10.4,0.80.2,0.80.4]=[0.1,0.3,0.6,0.4]x\left[ y\left[ j \right] \right] -x\left[ i \right] =\left[ 0.1-0.2, 0.1-0.4, 0.8-0.2, 0.8-0.4 \right] =\left[ -0.1, -0.3, 0.6, 0.4 \right]

所以loss=1.1+1.3+0.4+0.64=0.85loss=\frac{1.1+1.3+0.4+0.6}{4}=0.85

2.12 nn.SoftMarginLoss

1
2
3
nn.SoftMarginLoss(size_average=None, 
reduce=None,
reduction='mean')

功能:计算二分类的logistic损失

主要参数:

  • reduction :计算模式,可为none/sum/mean

计算公式:loss(x,y)=ilog(1+exp(y[i]x[i])) x.nelement ()\operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] * x[i]))}{\text { x.nelement }()}

2.13 nn.MultiLabelSoftMarginLoss

1
2
3
4
nn.MultiLabelSoftMarginLoss(weight=None, 
size_average=None,
reduce=None,
reduction='mean')

功能:SoftMarginLoss多标签版本

主要参数:

  • weight:各类别的loss设置权值
  • reduction :计算模式,可为none/sum/mean

计算公式:

loss(x,y)=1Ciy[i]log((1+exp(x[i]))1)+(1y[i])log(exp(x[i])(1+exp(x[i])))\operatorname{loss} (x, y)=-\frac{1}{C} * \sum_{i} y[i] * \log \left((1+\exp (-x[i]))^{-1}\right)+(1-y[i]) * \log \left(\frac{\exp (-x[i])}{(1+\exp (-x[i]))}\right)

2.14 nn.MultiMarginLoss

1
2
3
4
5
nn.MultiMarginLoss(p=1, margin=1.0, 
weight=None,
size_average=None,
reduce=None,
reduction='mean')

功能:计算多分类的折页损失

主要参数:

  • p :可选1或2
  • weight:各类别的loss设置权值
  • margin :边界值
  • reduction :计算模式,可为none/sum/mean

计算公式:

loss(x,y)=imax(0,marginx[y]+x[i]))px.size(0)\operatorname{loss}(x, y)=\frac{\left.\sum_{i} \max (0, \operatorname{margin}-x[y]+x[i])\right)^{p}}{\mathrm{x} . \operatorname{size}(0)}

2.15 nn.TripletMarginLoss

1
2
3
4
5
6
7
nn.TripletMarginLoss(margin=1.0, 
p=2.0,
eps=1e-06,
swap=False,
size_average=None,
reduce=None,
reduction='mean')

功能:计算三元组损失,人脸验证中常用

主要参数:

  • p :范数的阶,默认为2
  • margin :边界值
  • reduction :计算模式,可为none/sum/mean

计算公式:

L(a,p,n)=max{d(ai,pi)d(ai,ni)+margin,0}L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\operatorname{margin}, 0\right\}

d(xi,yi)=xiyipd\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{p}

2.16 nn.HingeEmbeddingLoss

1
2
3
4
nn.HingeEmbeddingLoss(margin=1.0, 
size_average=None,
reduce=None,
reduction='mean’)

功能:计算两个输入的相似性,常用于非线性embedding和半监督学习

特别注意:输入x应为两个输入之差的绝对值

主要参数:

  • margin :边界值
  • reduction :计算模式,可为none/sum/mean

计算公式:

ln={xn, if yn=1max{0,Δxn}, if yn=1l_{n}=\left\{\begin{array}{ll}x_{n}, & \text { if } y_{n}=1 \\ \max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1\end{array}\right.

2.17 nn.CosineEmbeddingLoss

1
2
3
4
nn.CosineEmbeddingLoss(margin=0.0, 
size_average=None,
reduce=None,
reduction='mean')

功能:采用余弦相似度计算两个输入的相似性

主要参数:

  • margin :可取值[-1, 1] , 推荐为[0, 0.5]
  • reduction :计算模式,可为none/sum/mean

计算公式:

loss(x,y)={1cos(x1,x2), if y=1max(0,cos(x1,x2)margin), if y=1\operatorname{loss}(x, y)=\left\{\begin{array}{ll}1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\ \max \left(0, \cos \left(x_{1}, x_{2}\right)-\operatorname{margin}\right), & \text { if } y=-1\end{array}\right.
cos(θ)=ABAB=i=1nAi×Bii=1n(Ai)2×i=1n(Bi)2\cos (\theta)=\frac{A \cdot B}{\|A\|\|B\|}=\frac{\sum_{i=1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i=1}^{n}\left(A_{i}\right)^{2}} \times \sqrt{\sum_{i=1}^{n}\left(B_{i}\right)^{2}}}

2.18 nn.CTCLoss

1
2
3
torch.nn.CTCLoss(blank=0, 
reduction='mean',
zero_infinity=False)

功能: 计算CTC损失,解决时序类数据的分类(Connectionist Temporal Classification)

主要参数:

  • blank :blank label
  • zero_infinity :无穷大的值或梯度置0
  • reduction :计算模式,可为none/sum/mean