Matplotlib hist() 函数

hist() 用于绘制直方图，将数据划分到若干个区间（bins）中，统计每个区间内数据出现的频率。

直方图是探索数据分布（集中趋势、离散程度、偏态等）的基础工具。

函数定义

pyplot 接口

matplotlib.pyplot.hist(x, bins=None, range=None, density=False,
    weights=None, cumulative=False, bottom=None, histtype='bar',
    align='mid', orientation='vertical', rwidth=None, log=False,
    color=None, label=None, stacked=False, **kwargs)

Axes 接口

Axes.hist(x, bins=None, range=None, density=False, weights=None,
    cumulative=False, bottom=None, histtype='bar', align='mid',
    orientation='vertical', rwidth=None, log=False, color=None,
    label=None, stacked=False, **kwargs)

参数说明

参数	类型	说明
x	array 或 array 序列	输入数据，可传入多个数据集
bins	int 或 sequence	区间数量或区间边界序列。默认用 'auto' 方法自动选择
range	tuple	数据范围 (lower, upper)，超出范围的数据被忽略
density	bool	若为 True，绘制概率密度直方图（面积之和为 1），而非频数
weights	array-like	每个数据点的权重
cumulative	bool 或 -1	若为 True，绘制累积直方图；-1 表示从大到小累积
histtype	str	直方图类型：'bar'(默认)、'barstacked'(堆叠)、'step'(阶梯线)、'stepfilled'(填充阶梯)
align	str	柱子对齐：'mid'(居中，默认)、'left'(左对齐)、'right'(右对齐)
orientation	str	'vertical'(垂直，默认) 或 'horizontal'(水平)
rwidth	float	柱子相对宽度，1.0 表示无间隙
log	bool	若为 True，y 轴使用对数坐标
color	color 或 list	柱子颜色
label	str 或 list	图例标签
stacked	bool	多组数据时是否堆叠显示

hist() 返回值是一个三元组：(n, bins, patches)。n 是每个区间的频数/密度，bins 是区间边界，patches 是绘制的柱子对象。

使用示例

示例 1：基本直方图

实例

import matplotlib.pyplot as plt
import numpy as np

# 生成正态分布随机数据
np.random.seed(42)
data = np.random.randn(1000)

fig, ax = plt.subplots(layout='constrained')

# 绘制直方图
n, bins, patches = ax.hist(data, bins=30,
color='steelblue', edgecolor='white')

# 高亮最大值所在的区间
max_idx = np.argmax(n)
patches[max_idx].set_facecolor('#e74c3c')

ax.set_title('Histogram of Normal Distribution')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.axvline(x=0, color='red', linestyle='--', alpha=0.7)
ax.grid(axis='y', alpha=0.3)
plt.show()

示例 2：多组数据对比直方图

实例

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)

# 三组不同分布的数据
data1 = np.random.normal(0, 1, 1000) # 标准正态
data2 = np.random.normal(2, 1.5, 800) # 均值=2, 标准差=1.5
data3 = np.random.normal(-1, 0.5, 600) # 均值=-1, 标准差=0.5

fig, ax = plt.subplots(figsize=(8, 5), layout='constrained')

# 多组数据对比，使用透明度和不同颜色
ax.hist(data1, bins=30, alpha=0.6, color='steelblue',
label='N(0, 1)', edgecolor='white')
ax.hist(data2, bins=30, alpha=0.6, color='coral',
label='N(2, 1.5)', edgecolor='white')
ax.hist(data3, bins=30, alpha=0.6, color='mediumseagreen',
label='N(-1, 0.5)', edgecolor='white')

ax.set_title('Comparing Multiple Distributions')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.legend()
ax.grid(axis='y', alpha=0.3)
plt.show()

示例 3：密度直方图 + 拟合曲线

实例

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data = np.random.randn(1000)

fig, ax = plt.subplots(figsize=(8, 5), layout='constrained')

# density=True 绘制概率密度（而非频数）
ax.hist(data, bins=30, density=True, alpha=0.7,
color='steelblue', edgecolor='white', label='Data Histogram')

# 叠加理论正态分布曲线
from scipy import stats
x = np.linspace(-4, 4, 200)
pdf = stats.norm.pdf(x, loc=0, scale=1)
ax.plot(x, pdf, 'r-', linewidth=2, label='Theoretical N(0,1) PDF')

ax.set_title('Density Histogram with PDF Curve')
ax.set_xlabel('Value')
ax.set_ylabel('Probability Density')
ax.legend()
ax.grid(axis='y', alpha=0.3)
plt.show()

示例 4：累积直方图

实例

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data = np.random.randn(500)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4),
layout='constrained')

# 左图：普通直方图
ax1.hist(data, bins=30, color='steelblue', edgecolor='white')
ax1.set_title('Standard Histogram')
ax1.set_xlabel('Value')
ax1.set_ylabel('Frequency')

# 右图：累积直方图
ax2.hist(data, bins=30, cumulative=True, color='coral',
edgecolor='white')
ax2.set_title('Cumulative Histogram')
ax2.set_xlabel('Value')
ax2.set_ylabel('Cumulative Frequency')

plt.show()

示例 5：阶梯直方图 (step histogram)

实例

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data = np.random.exponential(scale=2, size=500)

fig, ax = plt.subplots(layout='constrained')

# histtype='stepfilled' 绘制填充阶梯直方图
ax.hist(data, bins=30, histtype='stepfilled',
color='#3498db', alpha=0.6, edgecolor='black',
linewidth=1, label='Stepfilled')

# 叠加阶梯线
ax.hist(data, bins=30, histtype='step',
color='black', linewidth=2, label='Step outline')

ax.set_title('Step Histogram (histtype)')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.legend()
ax.grid(axis='y', alpha=0.3)
plt.show()
print("runoob: step histogram displayed")

常见问题

bins 参数怎么选？

太少会丢失数据特征，太多会引入噪声。

经验法则：sqrt(n)（平方根法则）、log2(n)+1（Sturges 法则），或直接使用默认的 'auto' 自动选择。

density=True 是什么意思？

不返回频数（count），而是使直方图总面积等于 1，形成概率密度的估计。适用于与概率密度函数（PDF）进行比较。

Matplotlib 参考文档

返回顶部

菜鸟教程