PyTorch torch.nn.BatchNorm2d 函数

torch.nn.BatchNorm2d 是 PyTorch 中用于二维批归一化的模块。

批归一化通过规范化层的输入来加速训练、稳定收敛，是现代深度神经网络中最常用的技术之一。

函数定义

torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

参数说明：

num_features (int): 输入通道数 C。
eps (float): 为数值稳定性添加到分母的值。默认为 1e-5。
momentum (float): 用于计算运行均值和方差。默认为 0.1。
affine (bool): 是否使用可学习的仿射参数（gamma 和 beta）。默认为 True。
track_running_stats (bool): 是否跟踪运行统计量。默认为 True。

属性：

weight (Tensor): 可学习的缩放参数 gamma，形状为 (num_features,)。
bias (Tensor): 可学习的偏移参数 beta，形状为 (num_features,)。
running_mean (Tensor): 运行均值，训练时不更新。
running_var (Tensor): 运行方差，训练时不更新。

数学原理

批归一化对每个通道独立进行标准化：

y = (x - E[x]) / sqrt(Var[x] + eps) * gamma + beta

其中 gamma 和 beta 是可学习的参数，允许网络恢复表达能力。

使用示例

示例 1: 基本用法

在卷积层后使用批归一化：

实例

import torch
import torch.nn as nn

# 创建批归一化层：32 个通道
bn = nn.BatchNorm2d(num_features=32)

# 打印参数
print("gamma (weight):", bn.weight.shape)
print("beta (bias):", bn.bias.shape)
print("running_mean:", bn.running_mean.shape)
print("running_var:", bn.running_var.shape)

# 创建输入：batch=4，通道=32，高=16，宽=16
input_tensor = torch.randn(4, 32, 16, 16)

# 前向传播
output = bn(input_tensor)

print("n输入均值 (按通道):", input_tensor.mean(dim=(0, 2, 3))[:5].tolist())
print("输出均值 (按通道):", output.mean(dim=(0, 2, 3))[:5].tolist())
print("n输入形状:", input_tensor.shape)
print("输出形状:", output.shape)

输出结果为：

gamma (weight): torch.Size([32])
beta (bias): torch.Size([32])
running_mean: torch.Size([32])
running_var: torch.Size([32])

输入均值 (按通道): tensor([ 0.0234,  0.0456, -0.0123, -0.0345,  0.0567])
输出均值 (按通道): tensor([ 0.,  0.,  0.,  0.,  0.])

输入形状: torch.Size([4, 32, 16, 16])
输出形状: torch.Size([4, 32, 16, 16])

训练时输出被标准化为均值 0、方差 1（后跟 gamma 和 beta 变换）。

示例 2: 训练 vs 评估模式

批归一化在训练和评估时的行为不同：

实例

import torch
import torch.nn as nn

bn = nn.BatchNorm2d(num_features=16)

# 训练模式
bn.train()
print("训练模式 - 需要梯度:", bn.weight.requires_grad)

# 模拟训练
for _ in range(10):
x = torch.randn(8, 16, 8, 8)
output = bn(x)

print("训练后 running_mean 前5:", bn.running_mean[:5].tolist())

# 评估模式
bn.eval()
print("n评估模式 - 需要梯度:", bn.weight.requires_grad)

# 评估时使用 running stats
x = torch.randn(4, 16, 8, 8)
output = bn(x)
print("评估时输出形状:", output.shape)

示例 3: 完整 CNN 示例

典型的批归一化 CNN 结构：

实例

import torch
import torch.nn as nn

class BNConvNet(nn.Module):
def __init__(self, num_classes=10):
super(BNConvNet, self).__init__()
# 卷积 + 批归一化 + 激活 + 池化
self.block1 = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(2, 2) # 32 -> 16
)
self.block2 = nn.Sequential(
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2) # 16 -> 8
)
# 全局平均池化
self.gap = nn.AdaptiveAvgPool2d(1)
self.classifier = nn.Linear(64, num_classes)

def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.gap(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x

model = BNConvNet()
input_image = torch.randn(2, 3, 32, 32)
output = model(input_image)

print("输入形状:", input_image.shape)
print("输出形状:", output.shape)

# 打印第一个 BN 层的参数
print("n第一个 BN 层的 gamma:", model.block1[1].weight[:5].tolist())
print("第一个 BN 层的 beta:", model.block1[1].bias[:5].tolist())

示例 4: 不使用 affine 参数

禁用可学习参数：

实例

import torch
import torch.nn as nn

# 不带可学习参数的批归一化
bn_no_affine = nn.BatchNorm2d(16, affine=False)

print("是否有 weight:", bn_no_affine.weight is not None)
print("是否有 bias:", bn_no_affine.bias is not None)

# 仍然会进行标准化
x = torch.randn(4, 16, 8, 8)
output = bn_no_affine(x)
print("n输出形状:", output.shape)

常见问题

Q1: 批归一化放在 ReLU 之前还是之后？

两种方式都有效。原始论文放在卷积之后、激活之前；实践中也常放在激活之后。

Q2: 小 batch size 时效果不好怎么办？

使用 GroupNorm 替代
使用 LayerNorm
增大 batch size
调整 momentum 参数

Q3: 为什么评估时需要切换到 eval 模式？

训练时使用 batch 统计量，评估时使用运行统计量。忘记切换会导致输出不一致。

使用场景

nn.BatchNorm2d 主要应用场景包括：

加速训练: 允许使用更大的学习率
稳定收敛: 减少内部协变量偏移
正则化: 提供轻微的正则化效果
图像分类: 几乎所有现代 CNN 都使用

注意：批归一化在训练时需要足够的 batch size，小 batch 可能导致统计不稳定。

PyTorch torch.nn 参考手册

返回顶部

菜鸟教程