PyTorch torch.nn.Conv2d 函数

torch.nn.Conv2d 是 PyTorch 中用于二维卷积的模块，是卷积神经网络（CNN）的核心组件。

它通过对输入张量应用可学习的卷积核来提取空间特征，广泛用于图像处理和计算机视觉任务。

函数定义

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

参数说明：

in_channels (int): 输入通道数。例如 RGB 图像为 3。
out_channels (int): 输出通道数，即卷积核的数量。
kernel_size (int 或 tuple): 卷积核的大小。可以是整数（正方形）或元组（高 x 宽）。
stride (int 或 tuple): 卷积核移动的步长。默认为 1。
padding (int 或 tuple): 输入边缘的填充大小。默认为 0。
dilation (int 或 tuple): 卷积核元素之间的间距。默认为 1（标准卷积）。
groups (int): 分组卷积的组数。默认为 1（标准卷积）。
bias (bool): 是否添加偏置项。默认为 True。
padding_mode (str): 填充模式。可选 'zeros'、'reflect'、'replicate'、'circular'。

属性：

weight (Tensor): 形状为 (out_channels, in_channels/groups, kernel_size[0], kernel_size[1]) 的可学习权重。
bias (Tensor): 形状为 (out_channels,) 的可学习偏置。

使用示例

示例 1: 基本用法

创建一个简单的二维卷积层：

实例

import torch
import torch.nn as nn

# 创建卷积层：输入3通道，输出32通道，卷积核3x3
conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)

# 打印权重和偏置的形状
print("权重形状:", conv.weight.shape) # torch.Size([32, 3, 3, 3])
print("偏置形状:", conv.bias.shape) # torch.Size([32])

# 创建输入张量：batch=1，通道=3，高=32，宽=32
input_tensor = torch.randn(1, 3, 32, 32)

# 前向传播
output = conv(input_tensor)

print("输入形状:", input_tensor.shape) # torch.Size([1, 3, 32, 32])
print("输出形状:", output.shape) # torch.Size([1, 32, 30, 30])

输出结果为：

权重形状: torch.Size([32, 3, 3, 3])
偏置形状: torch.Size([32])
输入形状: torch.Size([1, 3, 32, 32])
输出形状: torch.Size([1, 32, 30, 30])

默认情况下，padding=0，所以输出尺寸会减小。如果需要保持尺寸，可以添加 padding。

示例 2: 使用 padding 保持尺寸

通过添加 padding 可以保持输入输出尺寸一致：

实例

import torch
import torch.nn as nn

# 创建带 padding 的卷积层：padding=1 保持尺寸
conv_pad = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)

# 输入
input_tensor = torch.randn(1, 3, 32, 32)

# 前向传播
output = conv_pad(input_tensor)

print("输入形状:", input_tensor.shape)
print("输出形状:", output.shape) # 保持 32x32

输出结果为：

输入形状: torch.Size([1, 3, 32, 32])
输出形状: torch.Size([1, 32, 32, 32])

示例 3: 不同的 stride 和 dilation

调整 stride 和 dilation 可以改变输出尺寸和感受野：

实例

import torch
import torch.nn as nn

# 步长卷积：stride=2 减小尺寸
conv_stride = nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1)
input_tensor = torch.randn(1, 3, 32, 32)
output_stride = conv_stride(input_tensor)
print("Stride=2 -> 输出形状:", output_stride.shape)

# 空洞卷积：dilation=2 增大感受野
conv_dilation = nn.Conv2d(3, 32, kernel_size=3, dilation=2)
output_dilation = conv_dilation(input_tensor)
print("Dilation=2 -> 输出形状:", output_dilation.shape)

输出结果为：

Stride=2 -> 输出形状: torch.Size([1, 32, 16, 16])
Dilation=2 -> 输出形状: torch.Size([1, 32, 28, 28])

示例 4: 分组卷积

groups 参数可以实现分组卷积，常用于轻量级网络：

实例

import torch
import torch.nn as nn

# 分组卷积：groups=2 将输入分为2组
conv_group = nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, groups=2)

# 输入 4 通道
input_tensor = torch.randn(1, 4, 16, 16)

# 前向传播
output = conv_group(input_tensor)

print("输入形状:", input_tensor.shape)
print("输出形状:", output.shape)
print("权重形状:", conv_group.weight.shape) # 分组后权重形状不同

输出结果为：

输入形状: torch.Size([1, 4, 16, 16])
输出形状: torch.Size([1, 8, 14, 14])
权重形状: torch.Size([8, 2, 3, 3])

示例 5: 在神经网络中使用

构建一个简单的手写数字识别网络：

实例

import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
def __init__(self, num_classes=10):
super(SimpleCNN, self).__init__()
# 第一个卷积块
self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
self.bn1 = nn.BatchNorm2d(32)
self.relu1 = nn.ReLU()

# 第二个卷积块
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.bn2 = nn.BatchNorm2d(64)
self.relu2 = nn.ReLU()

# 池化层
self.pool = nn.MaxPool2d(2, 2)

# 全连接层
self.fc = nn.Linear(64 * 7 * 7, num_classes)

def forward(self, x):
# 第一个卷积块
x = self.conv1(x)
x = self.bn1(x)
x = self.relu1(x)
x = self.pool(x)

# 第二个卷积块
x = self.conv2(x)
x = self.bn2(x)
x = self.relu2(x)
x = self.pool(x)

# 展平并分类
x = x.view(x.size(0), -1)
x = self.fc(x)
return x

# 创建模型
model = SimpleCNN(num_classes=10)

# 测试输入：batch=4，灰度图 28x28
input_image = torch.randn(4, 1, 28, 28)
output = model(input_image)

print("输入形状:", input_image.shape)
print("输出形状:", output.shape) # torch.Size([4, 10])

输出尺寸计算

卷积层输出尺寸的计算公式：

H_out = floor((H_in + 2 * padding[0] - dilation[0] * (kernel_size[0] - 1) - 1) / stride[0]) + 1
W_out = floor((W_in + 2 * padding[1] - dilation[1] * (kernel_size[1] - 1) - 1) / stride[1]) + 1

常见问题

Q1: 如何选择卷积核大小？

常见的卷积核大小：

1x1: 用于改变通道数，添加非线性
3x3: 最常用，平衡了参数量和感受野
5x5、7x7: 较大的感受野，但参数量大

Q2: padding 和 stride 如何选择？

保持特征图尺寸使用 padding = (kernel_size - 1) / 2
下采样使用 stride > 1

使用场景

nn.Conv2d 是计算机视觉中最重要的层之一，主要应用场景包括：

图像分类: 提取图像特征，如 VGG、ResNet 等
目标检测: YOLO、Faster R-CNN 等
语义分割: U-Net、FCN 等
风格迁移: 生成艺术图像

提示：在现代 CNN 中，3x3 卷积是最常用的，它可以在保持较小参数量的情况下覆盖足够的空间信息。

PyTorch torch.nn 参考手册

返回顶部

菜鸟教程

PyTorch torch.nn.Conv2d 函数

函数定义

使用示例

示例 1: 基本用法

实例

示例 2: 使用 padding 保持尺寸

实例

示例 3: 不同的 stride 和 dilation

实例

示例 4: 分组卷积

实例

示例 5: 在神经网络中使用

实例

输出尺寸计算

常见问题

Q1: 如何选择卷积核大小？

Q2: padding 和 stride 如何选择？

使用场景