交叉熵损失函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import torch
import numpy as np
import torch.nn as nn

# dim=0 按列进行计算,dim=1 按行进行计算
# softmax ——> log_softmax --> NLLLOSS--> cross_entropy

x = torch.randn(3, 5, requires_grad=True)
"""
tensor([[ 1.3962, -0.2568, -0.7142, 1.1941, 0.5695],
[-0.7136, -1.0663, 1.7642, 0.5170, -0.1858],
[ 0.0424, -0.3354, -0.9049, 0.6952, 1.3032]], requires_grad=True)
"""
y = torch.empty(3, dtype=torch.long).random_(5)
"""
tensor([4, 3, 0])
"""

Softmax

Softmax是网络输出后第一步操作,其公式可表示为:

\[ \frac{e^{v_{y_n}}}{\sum_{m=1}^K e^{v_m}} \] 由于网络的输出有正有负,有大有小,Softmax主要是将输出概率标准化到 \([0,1]\)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 原理: 1、所有值取指数  2、每个值分别除以总数(三个值相加),得到的三个值相加为一,构造类似概率
a = torch.exp(x)
a /= torch.unsqueeze(torch.sum(a, dim=1),dim=1)

b = torch.softmax(x, dim=1) # softmax 得到的值,0 - 1 之间

print(f"手算结果:{a}, \nsoftmax结果:{b}") # 验证 torch.softmax 计算过程

"""
手算结果:tensor([[0.3911, 0.3240, 0.0895, 0.1785, 0.0168],
[0.3382, 0.1688, 0.3726, 0.0407, 0.0798],
[0.2213, 0.1558, 0.2684, 0.3061, 0.0484]], grad_fn=<DivBackward0>),
softmax结果:tensor([[0.3911, 0.3240, 0.0895, 0.1785, 0.0168],
[0.3382, 0.1688, 0.3726, 0.0407, 0.0798],
[0.2213, 0.1558, 0.2684, 0.3061, 0.0484]], grad_fn=<SoftmaxBackward0>)
"""

Log_Softmax

在Softmax之后进行log(以e为底),其公式可表示为: \[ \log \left(\frac{e^{v_{y_n}}}{\sum_{m=1}^K e^{v_m}}\right) \]

1
2
3
4
5
6
7
8
9
10
11
12
13
# 在softmax的基础上对每个元素求log
c = torch.log(a) # 给每个元素取对数
d = torch.log_softmax(x, dim=1) # 一定为负值

print(f"手算结果:{c}, \nlog_softmax结果:{d}") # 验证 torch.log_softmax 计算过程
"""
手算结果:tensor([[0.3911, 0.3240, 0.0895, 0.1785, 0.0168],
[0.3382, 0.1688, 0.3726, 0.0407, 0.0798],
[0.2213, 0.1558, 0.2684, 0.3061, 0.0484]], grad_fn=<DivBackward0>),
softmax结果:tensor([[0.3911, 0.3240, 0.0895, 0.1785, 0.0168],
[0.3382, 0.1688, 0.3726, 0.0407, 0.0798],
[0.2213, 0.1558, 0.2684, 0.3061, 0.0484]], grad_fn=<SoftmaxBackward0>)
"""

NLLLoss

NLLLoss损失,即对Log_Softmax之后的结果,将样本标签对应位置的数值进行相加,再除以样本量,最后再去负号,因为log之后是负数,损失需要转换为正值。

对 $[− 1.5425 , − 1.4425 , − 1.3425 , − 1.2425 ] \(和\) [ − 1.3863 , − 1.3863 , − 1.3863 , − 1.3863 ] \(标签对应位置\)target = [2, 3]\(上的数值相加除样本数量再取负,即:\)$ -=1.3644 $$

\[ \begin{aligned} & \operatorname{NLL}(\log (\operatorname{softmax}(\text { input })), \text { target })=-\Sigma_{\mathrm{i}=1}^{\mathrm{n}} \text { OneHot }(\text { target })_{\mathrm{i}} \times \log \left(\operatorname{softmax}(\text { input })_{\mathrm{i}}\right) (\text { input } \left.\in \mathbf{R}^{\mathbf{m} \times \mathbf{n}}\right) \end{aligned} \]

1
2
3
4
5
6
7
8
9
# 将y作为index,对应x的数相加,除以样本量,取负
e = - torch.sum(c[np.arange(len(y)), y]) / len(y)

nll_loss = torch.nn.NLLLoss()
f = nll_loss(c, y)

print(f"手算结果:{e}, NLLLoss结果{f}") # 验证 torch.log_softmax 计算过程
# 手算结果:1.8381218910217285, NLLLoss结果1.8381218910217285

CrossEntropyLoss

\(CrossEntropy\_Loss = Softmax + Log + NLLLoss = Log\_Softmax + NLLLoss\)

\[ -\frac{1}{N} \sum_{n=1}^N \log \left(\frac{e^{v_{y_n}}}{\sum_{m=1}^K e^{v_m}}\right) \]

1
2
3
4
5
cross_loss = torch.nn.CrossEntropyLoss()
g = cross_loss(x, y)

print(f"CrossEntropyLoss:{g}, NLLLoss结果{f}") # 验证 torch.CrossEntropyLoss 计算过程
# CrossEntropyLoss:1.8381218910217285, NLLLoss结果1.8381218910217285