损失函数

发表于 03-08-2020 更新于 02-02-2023 分类于 Pytorch

1. 损失函数概念

损失函数：衡量模型输出与真实标签的差异

损失函数(Loss Function)：

$Loss=f\left( \hat{y},y \right)$

代价函数(Cost Function)：

$Cost=\frac{1}{N}\sum_{i=1}^N{f\left( \hat{y}_i,y \right)}$

目标函数(Objective Function)：

$Obj\,\,=\,\,Cost\,\,+\,\,Regularization$

2. pytorch提供的各种损失函数

2.1 nn.CrossEntropyLoss

nn.CrossEntropyLoss(weight=None, 
                    size_average=None, 
                    ignore_index=-100, 
                    reduce=None, 
                    reduction='mean')

功能： nn.LogSoftmax ()与nn.NLLLoss ()结合，进行交叉熵计算

主要参数：

weight：reduction='mean’时，各类别的loss设置权值（若求平均值，则weight=[1,…,1]）
ignore_index：忽略某个类别
reduction ：计算模式，可为none/sum/mean
none- 逐个元素计算
sum- 所有元素求和，返回标量
mean- 加权平均，返回标量
size_average和reduce不用填，即将被抛弃的参数（下个版本就取消）

loss的计算公式如下所示：

$\operatorname{loss}(x, \text {class})=-\log \left(\frac{\exp (x[\text {class}])}{\sum_{j} \exp (x[j])}\right)=-x[\text {class}]+\log \left(\sum_{j} \exp (x[j])\right)$

（由于交叉熵中的p是标签，其实就是正确类别为1，错误类别为0，所以交叉熵就变成了上面这个样子。其实就是对正确类别的分数取了softmax概率值再求了信息量，概率越接近1，信息量越小）

如果定义了weight，loss的计算公式如下所示：

$\operatorname{loss}(x, \text { class })=$ weight $[\text { class }]\left(-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right)\right)$

交叉熵 = 信息熵 + 相对熵

交叉熵： $\mathrm{H}(\boldsymbol{P}, \boldsymbol{Q})=-\sum_{i=1}^{N} \boldsymbol{P}\left(\boldsymbol{x}_{i}\right) \log \boldsymbol{Q}\left(\boldsymbol{x}_{i}\right)$

自信息： $\mathrm{I}(x)=-\log [\boldsymbol{p}(\boldsymbol{x})]$

熵： $\mathrm{H}(\mathrm{P})=E_{x \sim p}[I(x)]=-\sum_{i}^{N} P\left(x_{i}\right) \log P\left(x_{i}\right)$

相对熵：

$\begin{aligned} \boldsymbol{D}_{K L}(\boldsymbol{P}, \boldsymbol{Q}) &=\boldsymbol{E}_{x \sim p}\left[\log \frac{P(x)}{Q(x)}\right] \\ &=E_{x \sim p}[\log P(x)-\log Q(x)] \\ &=\sum_{i=1}^{N} P\left(x_{i}\right)\left[\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right] \\ &=\sum_{i=1}^{N} P\left(x_{i}\right) \log P\left(x_{i}\right)-\sum_{i=1}^{N} P\left(x_{i}\right) \log Q\left(x_{i}\right) \\ &=\boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q})-\boldsymbol{H}(\mathrm{P}) \end{aligned}$

交叉熵： $\mathrm{H}(\boldsymbol{P}, \boldsymbol{Q})=\boldsymbol{D}_{K L}(\boldsymbol{P}, \boldsymbol{Q})+\mathrm{H}(\boldsymbol{P})$

实验：

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# ----------------------------------- CrossEntropy loss: reduction -----------------------------------
# def loss function
loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

# forward
loss_none = loss_f_none(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)

1 2	Cross Entropy Loss: tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

我们手动按公式计算一下：

loss_none：

对于[1, 2]：
1. 首先转化为softmax值： $\left[ \frac{e}{e+e^2}, \frac{e^2}{e+e^2} \right]$
2. 计算信息量： $\left[ -\log _e\left( \frac{e}{e+e^2} \right) , -\log _e\left( \frac{e^2}{e+e^2} \right) \right] \,\,=\left[ 1.31326, 0.313262 \right]$
3. 由于类别是0，所以输出为1.31326
对于[1, 3]：输出为 $-\log _e\left( \frac{e^3}{e+e^3} \right) =0.126928$
loss_sum： $1.31326+0.126928+0.126928=1.56712$
loss_mean： $\frac{1.31326+0.126928+0.126928}{3}=0.522372$

再看看weight的功能：

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# def loss function
weights = torch.tensor([1, 2], dtype=torch.float)

loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)

1 2	weights: tensor([1., 2.]) tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)

可以看到类别1的输出乘了2

2.2 nn.NLLLoss

nn.NLLLoss(weight=None,
           size_average=None, 
           ignore_index=-100, 
           reduce=None, 
           reduction='mean')

功能：实现负对数似然函数中的负号功能（挂羊头卖狗肉，NLL=negative log likelihood，只有negative，没有log likelihood）

主要参数：

weight：各类别的loss设置权值
ignore_index：忽略某个类别
reduction ：计算模式，可为none/sum/mean
none-逐个元素计算
sum-所有元素求和，返回标量
mean-加权平均，返回标量

如果最后一层是softmax（softmax居然在转化为概率之后还加了一个对数，为什么pytorch总是有多余的动作？），那么用上这个nn.NLLLoss就构成了nn.CrossEntropyLoss的效果。当然，如果你最后一层不想用softmax，那么可以直接用nn.CrossEntropyLoss。

实验：

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

loss_f_none = nn.NLLLoss(reduction='none')
loss_f_sum = nn.NLLLoss(reduction='sum')
loss_f_mean = nn.NLLLoss(reduction='mean')

# forward
loss_none = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("NLL Loss", loss_none, loss_sum, loss_mean)

1	NLL Loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

其实就是把相应类别的输入取了一个负号。

2.3 nn.BCELoss

nn.BCELoss(weight=None, 
           size_average=None, 
           reduce=None, 
           reduction='mean’)

功能：二分类交叉熵（Binary Cross Entropy）

注意事项：输入值取值在[0,1]

主要参数：

weight：各类别的loss设置权值
ignore_index：忽略某个类别
reduction ：计算模式，可为none/sum/mean
none-逐个元素计算
sum-所有元素求和，返回标量
mean-加权平均，返回标量

计算公式如下（含weight）：

$l_{n}=-w_{n}\left[y_{n} \cdot \log x_{n}+\left(1-y_{n}\right) \cdot \log \left(1-x_{n}\right)\right]$

另外BCELoss的forward函数接受的标签的格式也不一样，这里接受的是onehot标签，而且数据类型是torch.float。

实验：

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

inputs = torch.tensor([[0.1, 0.2], [0.2, 0.2], [0.3, 0.4], [0.4, 0.5]], dtype=torch.float)
target_bce = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

loss_f_none = nn.BCELoss(reduction='none')
loss_f_sum = nn.BCELoss(reduction='sum')
loss_f_mean = nn.BCELoss(reduction='mean')

# forward
loss_none = loss_f_none(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("BCE Loss", loss_none, loss_sum, loss_mean)

BCE Loss tensor([[2.3026, 0.2231],
        [1.6094, 0.2231],
        [0.3567, 0.9163],
        [0.5108, 0.6931]]) tensor(6.8352) tensor(0.8544)

验证：

[0.1, 0.2]： $\left[ -\log _e\left( 0.1 \right) =2.30259, -\log _e\left( 0.8 \right) =0.223144 \right]$
[0.2, 0.2]： $\left[ -\log _e\left( 0.2 \right) =1.60944, -\log _e\left( 0.8 \right) =0.223144 \right]$
[0.3, 0.4]： $\left[ -\log _e\left( 0.7 \right) =0.356675, -\log _e\left( 0.4 \right) =0.916291 \right]$
[0.4, 0.5]： $\left[ -\log _e\left( 0.6 \right) =0.510826, -\log _e\left( 0.5 \right) =0.693147 \right]$

二分类中，只有一个是正确的，所以标签是[0, 1]（或[1, 0]），则若input越接近[0, 1]（或[1, 0]），那么loss=[loss1, loss2]应该越接近[0, 0]。所以，可以将他们加起来得到loss_sum，或者求平均值得到loss_mean。

nn.BCELoss尽量把正确的score提高到1，把错误的score降到0。

对比nn.CrossEntropyLoss，由于softmax的使用，其实也是这样的。如果对[0.1, 0.2]用CrossEntropyLoss，输出为 $-\log _e\left( \frac{e^{0.1}}{e^{0.1}+e^{0.2}} \right) =0.744397$ ，要输出尽量小，则 $\frac{e^{0.1}}{e^{0.1}+e^{0.2}}$ 应尽量接近1，那么就是0.1要变大，0.2要变小，也是增加正确的，压制错误的。

2.4 nn.BCEWithLogitsLoss

nn.BCEWithLogitsLoss(weight=None, 
                     size_average=None, 
                     reduce=None, 
                     reduction='mean', 
                     pos_weight=None)

功能：结合Sigmoid与二分类交叉熵

主要参数：

pos_weight ：正样本的权值
weight：各类别的loss设置权值
ignore_index：忽略某个类别
reduction ：计算模式，可为none/sum/mean
none-逐个元素计算
sum-所有元素求和，返回标量
mean-加权平均，返回标量

计算公式：

$l_{n}=-w_{n}\left[y_{n} \cdot \log \sigma\left(x_{n}\right)+\left(1-y_{n}\right) \cdot \log \left(1-\sigma\left(x_{n}\right)\right)\right]$

就是比nn.BCELoss多了一个sigmoid的激活；参数多了一个pos_weight ，在正负样本数量不平衡时，可以用这个权重抵消这种差异带来的影响（比如正样本只有100个而负样本有300个，可以设置pos_weight =3）。

实验：

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target
inputs = torch.sigmoid(inputs)

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')
loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\nweights: ", weights)
print("BCE Loss", loss_none_w, loss_sum, loss_mean)

# =======================我是分界线=====================

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target_bce = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none')
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)

# =======================我是分界线=====================

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target_bce = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

weights = torch.tensor([1], dtype=torch.float)
pos_w = torch.tensor([3], dtype=torch.float)  # 3

loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w)
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w)
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\npos_weights: ", pos_w)
print(loss_none_w, loss_sum, loss_mean)

weights:  tensor([1., 1.])
BCE Loss tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

weights:  tensor([1., 1.])
tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

pos_weights:  tensor([3.])
tensor([[0.9398, 2.1269],
        [0.3808, 2.1269],
        [3.0486, 0.0544],
        [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)

2.5 nn.L1Loss

1
2
3

nn.L1Loss(size_average=None, 
          reduce=None,
          reduction='mean’)

功能：计算inputs与target之差的绝对值

主要参数：

reduction ：
none- 逐个元素计算
sum- 所有元素求和，返回标量
mean- 加权平均，返回标量

计算公式： $l_{n}=\left|x_{n}-y_{n}\right|$

实验:

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1)  # 设置随机种子

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f = nn.L1Loss(reduction='none')
loss = loss_f(inputs, target)

input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
L1 loss:tensor([[2., 2.],
        [2., 2.]])

2.6 nn.MSELoss

1
2
3

nn.MSELoss(size_average=None, 
           reduce=None, 
           reduction='mean’)

功能：计算inputs与target之差的平方

主要参数：

reduction ：
none- 逐个元素计算
sum- 所有元素求和，返回标量
mean- 加权平均，返回标量

计算公式： $l_{n}=\left(x_{n}-y_{n}\right)^{2}$

实验:

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1)  # 设置随机种子

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f_mse = nn.MSELoss(reduction='none')
loss_mse = loss_f_mse(inputs, target)

print("input:{}\ntarget:{}\nMSE loss:{}".format(inputs, target, loss_mse))

input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
MSE loss:tensor([[4., 4.],
        [4., 4.]])

2.7 nn.SmoothL1Loss

1
2
3

nn.SmoothL1Loss(size_average=None, 
                reduce=None, 
                reduction='mean’)

功能：平滑的L1损失函数

$\operatorname{loss}(x, y)=\frac{1}{n} \sum_{i} z_{i}$

$z_{i}=\left\{\begin{array}{ll}0.5\left(x_{i}-y_{i}\right)^{2}, & \text { if }\left|x_{i}-y_{i}\right|<1 \\ \left|x_{i}-y_{i}\right|-0.5, & \text { otherwise }\end{array}\right.$

默认reduction='mean’

2.8 nn.PoissonNLLLoss

nn.PoissonNLLLoss(log_input=True, 
                  full=False, 
                  size_average=None, 
                  eps=1e-08, 
                  reduce=None, 
                  reduction='mean')

功能：泊松分布的负对数似然损失函数

主要参数：

log_input ：输入是否为对数形式，决定计算公式
- log_input = True时
  loss(input, target) = exp(input) - target * input
- log_input = False时
  loss(input, target) = input - target * log(input+eps)
full ：计算所有loss，默认为False
eps ：修正项，避免log（input）为nan

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1)  # 设置随机种子

inputs = torch.randn((2, 2))
target = torch.randn((2, 2))

loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
loss = loss_f(inputs, target)
print("input:{}\ntarget:{}\nPoisson NLL loss:{}".format(inputs, target, loss))

# --------------------------------- compute by hand

loss_1 = torch.exp(inputs) - target*inputs

print("公式结果:", loss_1)

input:tensor([[0.6614, 0.2669],
        [0.0617, 0.6213]])
target:tensor([[-0.4519, -0.1661],
        [-1.5228,  0.3817]])
Poisson NLL loss:tensor([[2.2363, 1.3503],
        [1.1575, 1.6242]])
公式结果: tensor([[2.2363, 1.3503],
        [1.1575, 1.6242]])

可见结果是一样的。

2.9 nn.KLDivLoss

1
2
3

nn.KLDivLoss(size_average=None, 
             reduce=None, 
             reduction='mean')

功能：计算KLD（divergence），KL散度，相对熵

注意事项：需提前将输入计算 log-probabilities，如通nn.logsoftmax()

主要参数：

reduction ：none/sum/mean/batchmean
batchmean- batchsize维度求平均值
none- 逐个元素计算
sum- 所有元素求和，返回标量
mean- 加权平均，返回标量

KL散度的公式：

$\begin{aligned} D_{K L}(P \| Q)=E_{x \sim p}\left[\log \frac{P(x)}{Q(x)}\right]&=E_{x \sim p}[\log P(x)-\log Q(x)] \\ &=\sum_{i=1}^{N} P\left(x_{i}\right)\left(\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right) \end{aligned}$

其中 $P(x_i)$ 是标签

函数实现的真正公式：

$l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right)$

当 $y_n=0$ 时， $l_n=0$ ；当 $y_n=1$ 时， $l_n=-x_n$

并且发现这里的输入 $x_n$ 没有取对数，这是因为有些层的函数自带log-probabilities，比如说softmax层，所以这里为了方便就省去了。

当reduction='batchmean’就实现了KL散度的公式（还除多了一个N），带一个均值功能。

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1)  # 设置随机种子

inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])
inputs_log = torch.log(inputs)
target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

loss_f_none = nn.KLDivLoss(reduction='none')
loss_f_mean = nn.KLDivLoss(reduction='mean')
loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean')

loss_none = loss_f_none(inputs, target)
loss_mean = loss_f_mean(inputs, target)
loss_bs_mean = loss_f_bs_mean(inputs, target)

print("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}".format(loss_none, loss_mean, loss_bs_mean))

loss_1 = target * (torch.log(target) - inputs)
print("公式结果:", loss_1)

loss_none:
tensor([[-0.5448, -0.1648, -0.1598],
        [-0.2503, -0.4597, -0.4219]])
loss_mean:
-0.3335360586643219
loss_bs_mean:
-1.000608205795288

公式结果: tensor([[-0.5448, -0.1648, -0.1598],
        [-0.2503, -0.4597, -0.4219]])

可见，reduction='mean’是除以6，即所有数据的数量；reduction='batchmean’是除以2，即所有batch的数量

2.10 nn.MarginRankingLoss

nn.MarginRankingLoss(margin=0.0, 
                     size_average=None, 
                     reduce=None, 
                     reduction='mean')

功能：计算两个向量之间的相似度，用于排序任务

特别说明：该方法计算两组数据之间的差异，返回一个n*n的 loss 矩阵

主要参数：

margin ：边界值，x1与x2之间的差异值
reduction ：计算模式，可为none/sum/mean

y = 1时，希望x1比x2大，当x1>x2时，不产生loss
y = -1时，希望x2比x1大，当x2>x1时，不产生loss

计算公式：

$\operatorname{loss}(x, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin})$

实验：

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1)  # 设置随机种子

x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
x2 = torch.tensor([[3], [2], [2]], dtype=torch.float)

target = torch.tensor([1], dtype=torch.float)

loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')

loss = loss_f_none(x1, x2, target)

print(loss)

1
2
3

tensor([[2.],
        [0.],
        [0.]])

y=1，则loss的计算是 $max(0, x_2 - x_1 + margin)$

x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
x2 = torch.tensor([[3], [2], [2]], dtype=torch.float)

target = torch.tensor([1, 1, -1], dtype=torch.float)

loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')

loss = loss_f_none(x1, x2, target)

print(loss)

1
2
3

tensor([[2., 2., 0.],
        [0., 0., 0.],
        [0., 0., 1.]])

第一列第二列的计算都是： $max(0, x_2 - x_1 + margin)$

第三列的计算是： $max(0, x_1 - x_2 + margin)$

2.11 nn.MultiLabelMarginLoss

1
2
3

nn.MultiLabelMarginLoss(size_average=None, 
                        reduce=None, 
                        reduction='mean')

功能：多标签边界损失函数（一个输入可能属于多个类别）

举例：四分类任务，样本x属于0类和3类，

标签：[0, 3, -1, -1]（表示这个输入属于第0和第3类，-1是用来填充的，

保存label和input的size一样）

主要参数：

reduction ：计算模式，可为none/sum/mean

计算公式： $\operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-(x[y[j]]-x[i]))}{\mathrm{x} . \operatorname{size}(0)}$

$x[y[j]]-x[i]$ 的意思是：属于类别的input值减去不属于类别的input值

实验：

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from tools.common_tools import set_seed

set_seed(1)  # 设置随机种子

x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

loss_f = nn.MultiLabelMarginLoss(reduction='none')

loss = loss_f(x, y)

print(loss)

1	tensor([0.8500])

上例中 $x\left[ y\left[ j \right] \right] -x\left[ i \right] =\left[ 0.1-0.2, 0.1-0.4, 0.8-0.2, 0.8-0.4 \right] =\left[ -0.1, -0.3, 0.6, 0.4 \right]$

所以 $loss=\frac{1.1+1.3+0.4+0.6}{4}=0.85$

2.12 nn.SoftMarginLoss

1
2
3

nn.SoftMarginLoss(size_average=None, 
                  reduce=None, 
                  reduction='mean')

功能：计算二分类的logistic损失

主要参数：

reduction ：计算模式，可为none/sum/mean

计算公式： $\operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] * x[i]))}{\text { x.nelement }()}$

2.13 nn.MultiLabelSoftMarginLoss

nn.MultiLabelSoftMarginLoss(weight=None, 
                            size_average=None,
                            reduce=None,
                            reduction='mean')

功能：SoftMarginLoss多标签版本

主要参数：

weight：各类别的loss设置权值
reduction ：计算模式，可为none/sum/mean

计算公式：

$\operatorname{loss} (x, y)=-\frac{1}{C} * \sum_{i} y[i] * \log \left((1+\exp (-x[i]))^{-1}\right)+(1-y[i]) * \log \left(\frac{\exp (-x[i])}{(1+\exp (-x[i]))}\right)$

2.14 nn.MultiMarginLoss

nn.MultiMarginLoss(p=1, margin=1.0, 
                   weight=None, 
                   size_average=None, 
                   reduce=None, 
                   reduction='mean')

功能：计算多分类的折页损失

主要参数：

p ：可选1或2
weight：各类别的loss设置权值
margin ：边界值
reduction ：计算模式，可为none/sum/mean

计算公式：

$\operatorname{loss}(x, y)=\frac{\left.\sum_{i} \max (0, \operatorname{margin}-x[y]+x[i])\right)^{p}}{\mathrm{x} . \operatorname{size}(0)}$

2.15 nn.TripletMarginLoss

nn.TripletMarginLoss(margin=1.0, 
                     p=2.0, 
                     eps=1e-06, 
                     swap=False, 
                     size_average=None, 
                     reduce=None, 
                     reduction='mean')

功能：计算三元组损失，人脸验证中常用

主要参数：

p ：范数的阶，默认为2
margin ：边界值
reduction ：计算模式，可为none/sum/mean

计算公式：

$L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\operatorname{margin}, 0\right\}$

$d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{p}$

2.16 nn.HingeEmbeddingLoss

nn.HingeEmbeddingLoss(margin=1.0, 
                      size_average=None, 
                      reduce=None, 
                      reduction='mean’)

功能：计算两个输入的相似性，常用于非线性embedding和半监督学习

特别注意：输入x应为两个输入之差的绝对值

主要参数：

margin ：边界值
reduction ：计算模式，可为none/sum/mean

计算公式：

$l_{n}=\left\{\begin{array}{ll}x_{n}, & \text { if } y_{n}=1 \\ \max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1\end{array}\right.$

2.17 nn.CosineEmbeddingLoss

nn.CosineEmbeddingLoss(margin=0.0, 
                       size_average=None, 
                       reduce=None, 
                       reduction='mean')

功能：采用余弦相似度计算两个输入的相似性

主要参数：

margin ：可取值[-1, 1] , 推荐为[0, 0.5]
reduction ：计算模式，可为none/sum/mean

计算公式：

$\operatorname{loss}(x, y)=\left\{\begin{array}{ll}1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\ \max \left(0, \cos \left(x_{1}, x_{2}\right)-\operatorname{margin}\right), & \text { if } y=-1\end{array}\right.$
$\cos (\theta)=\frac{A \cdot B}{\|A\|\|B\|}=\frac{\sum_{i=1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i=1}^{n}\left(A_{i}\right)^{2}} \times \sqrt{\sum_{i=1}^{n}\left(B_{i}\right)^{2}}}$

2.18 nn.CTCLoss

1
2
3

torch.nn.CTCLoss(blank=0, 
                 reduction='mean', 
                 zero_infinity=False)

功能：计算CTC损失，解决时序类数据的分类（Connectionist Temporal Classification）

主要参数：

blank ：blank label
zero_infinity ：无穷大的值或梯度置0
reduction ：计算模式，可为none/sum/mean