交叉熵损失函数

import torch
import numpy as np
import torch.nn as nn

# dim=0 按列进行计算，dim=1 按行进行计算
# softmax ——> log_softmax --> NLLLOSS--> cross_entropy

x = torch.randn(3, 5, requires_grad=True)
"""
tensor([[ 1.3962, -0.2568, -0.7142,  1.1941,  0.5695],
        [-0.7136, -1.0663,  1.7642,  0.5170, -0.1858],
        [ 0.0424, -0.3354, -0.9049,  0.6952,  1.3032]], requires_grad=True)
"""
y = torch.empty(3, dtype=torch.long).random_(5)
"""
tensor([4, 3, 0])
"""

Softmax

Softmax是网络输出后第一步操作，其公式可表示为：

\[ \frac{e^{v_{y_n}}}{\sum_{m=1}^K e^{v_m}} \] 由于网络的输出有正有负，有大有小，Softmax主要是将输出概率标准化到 $[0,1]$

# 原理： 1、所有值取指数  2、每个值分别除以总数（三个值相加），得到的三个值相加为一，构造类似概率
a = torch.exp(x)
a /= torch.unsqueeze(torch.sum(a, dim=1),dim=1)

b = torch.softmax(x, dim=1)  # softmax 得到的值,0 - 1 之间

print(f"手算结果：{a}, \nsoftmax结果：{b}")  # 验证 torch.softmax 计算过程

"""
手算结果：tensor([[0.3911, 0.3240, 0.0895, 0.1785, 0.0168],
        [0.3382, 0.1688, 0.3726, 0.0407, 0.0798],
        [0.2213, 0.1558, 0.2684, 0.3061, 0.0484]], grad_fn=<DivBackward0>), 
softmax结果：tensor([[0.3911, 0.3240, 0.0895, 0.1785, 0.0168],
        [0.3382, 0.1688, 0.3726, 0.0407, 0.0798],
        [0.2213, 0.1558, 0.2684, 0.3061, 0.0484]], grad_fn=<SoftmaxBackward0>)
"""

Log_Softmax

在Softmax之后进行log(以e为底)，其公式可表示为： \[ \log \left(\frac{e^{v_{y_n}}}{\sum_{m=1}^K e^{v_m}}\right) \]

# 在softmax的基础上对每个元素求log
c = torch.log(a)  # 给每个元素取对数
d = torch.log_softmax(x, dim=1)  # 一定为负值

print(f"手算结果：{c}, \nlog_softmax结果：{d}")  # 验证 torch.log_softmax 计算过程
"""
手算结果：tensor([[0.3911, 0.3240, 0.0895, 0.1785, 0.0168],
        [0.3382, 0.1688, 0.3726, 0.0407, 0.0798],
        [0.2213, 0.1558, 0.2684, 0.3061, 0.0484]], grad_fn=<DivBackward0>), 
softmax结果：tensor([[0.3911, 0.3240, 0.0895, 0.1785, 0.0168],
        [0.3382, 0.1688, 0.3726, 0.0407, 0.0798],
        [0.2213, 0.1558, 0.2684, 0.3061, 0.0484]], grad_fn=<SoftmaxBackward0>)
"""

NLLLoss

NLLLoss损失，即对Log_Softmax之后的结果，将样本标签对应位置的数值进行相加，再除以样本量，最后再去负号，因为log之后是负数，损失需要转换为正值。

对 $[− 1.5425 , − 1.4425 , − 1.3425 , − 1.2425 ] $和$ [ − 1.3863 , − 1.3863 , − 1.3863 , − 1.3863 ] $标签对应位置$target = [2, 3]$上的数值相加除样本数量再取负，即：$$ -=1.3644 $$

\[ \begin{aligned} & \operatorname{NLL}(\log (\operatorname{softmax}(\text { input })), \text { target })=-\Sigma_{\mathrm{i}=1}^{\mathrm{n}} \text { OneHot }(\text { target })_{\mathrm{i}} \times \log \left(\operatorname{softmax}(\text { input })_{\mathrm{i}}\right) (\text { input } \left.\in \mathbf{R}^{\mathbf{m} \times \mathbf{n}}\right) \end{aligned} \]

# 将y作为index，对应x的数相加，除以样本量，取负
e = - torch.sum(c[np.arange(len(y)), y]) /  len(y)

nll_loss = torch.nn.NLLLoss()
f = nll_loss(c, y)

print(f"手算结果：{e}, NLLLoss结果{f}")  # 验证 torch.log_softmax 计算过程
# 手算结果：1.8381218910217285, NLLLoss结果1.8381218910217285

CrossEntropyLoss

$CrossEntropy\_Loss = Softmax + Log + NLLLoss = Log\_Softmax + NLLLoss$

\[ -\frac{1}{N} \sum_{n=1}^N \log \left(\frac{e^{v_{y_n}}}{\sum_{m=1}^K e^{v_m}}\right) \]

cross_loss = torch.nn.CrossEntropyLoss()
g = cross_loss(x, y)

print(f"CrossEntropyLoss：{g}, NLLLoss结果{f}")  # 验证 torch.CrossEntropyLoss 计算过程
# CrossEntropyLoss：1.8381218910217285, NLLLoss结果1.8381218910217285