PyTorch torch.nn.Linear 函数

torch.nn.Linear 是 PyTorch 中用于创建全连接层（也称为线性层或仿射变换）的模块。

它是神经网络中最基础也是最常用的层之一，负责将输入特征线性变换到输出特征空间。

函数定义

torch.nn.Linear(in_features, out_features, bias=True)

参数说明：

in_features (int): 输入特征的维度，即上一层输出的特征数。
out_features (int): 输出特征的维度，即本层输出的特征数。
bias (bool): 是否添加偏置项。默认为 True。如果设置为 False，则该层不会学习偏置参数。

属性：

weight (Tensor): 形状为 (out_features, in_features) 的可学习权重矩阵。
bias (Tensor): 形状为 (out_features,) 的可学习偏置向量。如果 bias=False，则不存在此属性。

数学原理

nn.Linear 执行的计算公式如下：

y = xA^T + b

其中：

x 是输入张量，形状为 (..., in_features)
A 是权重矩阵，形状为 (out_features, in_features)
b 是偏置向量，形状为 (out_features)
y 是输出张量，形状为 (..., out_features)

符号 ... 表示输入可以是任意维度的张量，线性变换会作用在最后一个维度上。

使用示例

示例 1: 基本用法

创建一个简单的全连接层，将 10 维输入映射到 5 维输出：

实例

import torch
import torch.nn as nn

# 创建线性层：输入 10 维，输出 5 维
linear_layer = nn.Linear(in_features=10, out_features=5, bias=True)

# 打印权重和偏置的形状
print("权重形状:", linear_layer.weight.shape) # torch.Size([5, 10])
print("偏置形状:", linear_layer.bias.shape) # torch.Size([5])

# 创建输入张量：batch_size=3，特征维度=10
input_tensor = torch.randn(3, 10)

# 前向传播
output = linear_layer(input_tensor)

print("输入形状:", input_tensor.shape) # torch.Size([3, 10])
print("输出形状:", output.shape) # torch.Size([3, 5])
print("输出数据:n", output)

输出结果为：

权重形状: torch.Size([5, 10])
偏置形状: torch.Size([5])
输入形状: torch.Size([3, 10])
输出形状: torch.Size([3, 5])
输出数据:
 tensor([[-0.1838,  0.0607, -0.4879,  0.8981, -0.2098],
        [ 0.1513, -0.1873,  0.1866, -0.2448, -0.6012],
        [ 0.2915,  0.3053,  0.2532, -0.3372, -0.3968]],
       grad_fn=<AddmmBackward0>)

在这个示例中，我们创建了一个输入为 10 维、输出为 5 维的全连接层。输入张量的形状为 (3, 10)，其中 3 是批量大小，10 是特征维度。输出张量的形状为 (3, 5)。

示例 2: 不使用偏置

创建一个没有偏置项的线性层：

实例

import torch
import torch.nn as nn

# 创建不带偏置的线性层
linear_no_bias = nn.Linear(in_features=10, out_features=5, bias=False)

# 检查偏置是否存在
print("偏置是否存在:", linear_no_bias.bias is None) # True

# 前向传播
input_tensor = torch.randn(3, 10)
output = linear_no_bias(input_tensor)

print("输出形状:", output.shape) # torch.Size([3, 5])
print("输出:n", output)

输出结果为：

偏置是否存在: True
输出形状: torch.Size([3, 5])
输出:
 tensor([[-0.3312, -0.4113,  0.0257, -0.4876,  0.0780],
        [ 0.1513,  0.2459, -0.2983,  0.2456, -0.0727],
        [-0.0143,  0.3053,  0.1866, -0.3372,  0.2532]],
       grad_fn=<MmBackward>)

示例 3: 多维输入

nn.Linear 可以处理任意维度的输入，只对最后一个维度进行变换：

实例

import torch
import torch.nn as nn

# 创建线性层
linear = nn.Linear(in_features=10, out_features=5)

# 处理二维输入 (batch, features)
input_2d = torch.randn(8, 10)
output_2d = linear(input_2d)
print("二维输入 -> 输出:", input_2d.shape, "->", output_2d.shape)

# 处理三维输入 (batch, seq, features)
input_3d = torch.randn(4, 6, 10)
output_3d = linear(input_3d)
print("三维输入 -> 输出:", input_3d.shape, "->", output_3d.shape)

# 处理四维输入 (batch, channels, height, width)
input_4d = torch.randn(2, 3, 4, 10)
output_4d = linear(input_4d)
print("四维输入 -> 输出:", input_4d.shape, "->", output_4d.shape)

输出结果为：

二维输入 -> 输出: torch.Size([8, 10]) -> torch.Size([8, 5])
三维输入 -> 输出: torch.Size([4, 6, 10]) -> torch.Size([4, 6, 5])
四维输入 -> 输出: torch.Size([2, 3, 4, 10]) -> torch.Size([2, 3, 4, 5])

示例 4: 在神经网络中使用

在实际神经网络中，nn.Linear 通常与其他层组合使用：

实例

import torch
import torch.nn as nn

# 定义一个多层感知机 (MLP)
class MLP(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(MLP, self).__init__()
# 第一层：输入 -> 隐藏层
self.fc1 = nn.Linear(input_dim, hidden_dim)
# 激活函数
self.relu = nn.ReLU()
# 第二层：隐藏层 -> 输出
self.fc2 = nn.Linear(hidden_dim, output_dim)

def forward(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x

# 创建模型
model = MLP(input_dim=784, hidden_dim=256, output_dim=10)

# 打印模型结构
print("模型结构:")
print(model)

# 测试前向传播
input_tensor = torch.randn(32, 784) # batch_size=32, 28x28=784
output = model(input_tensor)

print("n输入形状:", input_tensor.shape) # torch.Size([32, 784])
print("输出形状:", output.shape) # torch.Size([32, 10])

输出结果为：

模型结构:
MLP(
  (fc1): Linear(in_features=784, out_features=256, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=256, out_features=10, bias=True)
)

输入形状: torch.Size([32, 784])
输出形状: torch.Size([32, 10])

权重初始化

默认情况下，nn.Linear 使用 PyTorch 的默认初始化策略。

你也可以手动初始化权重：

实例

import torch
import torch.nn as nn

# 创建线性层
linear = nn.Linear(10, 5)

# 使用 Xavier 初始化权重
nn.init.xavier_uniform_(linear.weight)
# 将偏置初始化为零
nn.init.zeros_(linear.bias)

# 查看初始化后的权重
print("权重:n", linear.weight.data)
print("偏置:", linear.bias.data)

与 nn.functional.linear 的区别

PyTorch 还提供了函数式接口 torch.nn.functional.linear：

实例

import torch
import torch.nn as nn
import torch.nn.functional as F

# 方法一：使用 nn.Module
linear_module = nn.Linear(10, 5)
output1 = linear_module(torch.randn(3, 10))

# 方法二：使用函数式接口
weight = torch.randn(5, 10)
bias = torch.randn(5)
output2 = F.linear(torch.randn(3, 10), weight, bias)

print("nn.Module 输出形状:", output1.shape)
print("nn.functional 输出形状:", output2.shape)

两者的区别：

nn.Linear 是模块类，会保存权重和偏置参数，便于训练和保存模型。
F.linear 是函数，需要手动传入权重和偏置，常用于没有可学习参数的场合。

注意：虽然两者的数学运算相同，但在构建神经网络时，通常使用 nn.Linear，因为它会自动注册参数，方便优化器更新。

常见问题

Q1: 如何查看 Linear 层的参数数量？

对于一个 (in_features, out_features) 的线性层：

权重参数数量: in_features * out_features
偏置参数数量: out_features (如果 bias=True)

实例

import torch
import torch.nn as nn

linear = nn.Linear(100, 50)
print("总参数数量:", sum(p.numel() for p in linear.parameters()))
print("权重参数:", linear.weight.numel())
print("偏置参数:", linear.bias.numel())

Q2: 如何冻结 Linear 层的参数？

如果你想固定某些层不参与训练，可以设置 requires_grad=False：

实例

import torch
import torch.nn as nn

linear = nn.Linear(10, 5)

# 冻结权重，使其不参与梯度计算
linear.weight.requires_grad = False
# 冻结偏置
linear.bias.requires_grad = False

# 验证
print("权重 requires_grad:", linear.weight.requires_grad)
print("偏置 requires_grad:", linear.bias.requires_grad)

使用场景

nn.Linear 是神经网络中最常用的层之一，主要应用场景包括：

多层感知机 (MLP): 作为全连接层，将特征映射到新的特征空间。
分类器: 在卷积或循环网络的最后，使用全连接层进行分类。
特征变换: 对数据进行线性变换，实现降维或升维。
注意力机制: 在 Transformer 中用于生成 Q、K、V 矩阵。

PyTorch torch.nn 参考手册

返回顶部

菜鸟教程

PyTorch torch.nn.Linear 函数

函数定义

数学原理

使用示例

示例 1: 基本用法

实例

示例 2: 不使用偏置

实例

示例 3: 多维输入

实例

示例 4: 在神经网络中使用

实例

权重初始化

实例

与 nn.functional.linear 的区别

实例

常见问题

Q1: 如何查看 Linear 层的参数数量？

实例

Q2: 如何冻结 Linear 层的参数？

实例

使用场景