Pandas pd.read_json() 函数

read_json() 是 pandas 库中用于读取 JSON（JavaScript Object Notation）文件的函数，支持多种 JSON 格式的数据导入。

JSON 是一种轻量级的数据交换格式，易于人类阅读和编写，也易于机器解析和生成。在 Web API、配置文件等场景中广泛应用。read_json() 能够将 JSON 数据转换为 pandas 的 DataFrame 格式，方便进行数据分析。

基本语法与参数

语法格式

pandas.read_json(path_or_buf, orient=None, typ='frame', dtype=None,
                convert_axes=None, convert_dates=True, keep_default_dates=True,
                numpy=False, precise_float=False, date_unit='ms', ...)

参数说明

参数	类型	说明	默认值
path_or_buf	str, path object, 或 file-like object	JSON 文件路径、URL 或字符串	必填
orient	str	JSON 数据的格式：'split', 'records', 'index', 'columns', 'values'	None
typ	str	返回类型：'frame' 返回 DataFrame，'series' 返回 Series	'frame'
dtype	dict	指定列的数据类型	None
convert_axes	bool	是否转换轴为日期时间	None
convert_dates	bool, list	是否转换日期列	True
numpy	bool	是否使用 numpy 数组	False

返回值

返回类型：pd.DataFrame 或 pd.Series
默认返回 DataFrame，可以是二维表格形式。
当 typ='series' 时返回 Series。

实例

通过以下示例，全面掌握 read_json() 的各种用法。

示例 1：读取不同格式的 JSON 数据

JSON 数据有多种格式，read_json() 支持最常见的几种格式。

实例

import pandas as pd
import json

# 示例数据：JSON 数组格式（records format）
json_records = '''
[
{"name": "Tom", "age": 28, "city": "Beijing", "salary": 8000},
{"name": "Jerry", "age": 35, "city": "Shanghai", "salary": 12000},
{"name": "Mike", "age": 42, "city": "Guangzhou", "salary": 15000}
]
'''

# 将 JSON 字符串写入文件
with open('data_records.json', 'w', encoding='utf-8') as f:
f.write(json_records)

# 读取 JSON 文件（records 格式，最常用）
# orient='records' 表示每行是一个 JSON 对象
df_records = pd.read_json('data_records.json', orient='records')
print("Records 格式:")
print(df_records)
print()

# 示例数据：JSON 对象格式（index format）
json_index = '''
{
"Tom": {"age": 28, "city": "Beijing", "salary": 8000},
"Jerry": {"age": 35, "city": "Shanghai", "salary": 12000},
"Mike": {"age": 42, "city": "Guangzhou", "salary": 15000}
}
'''

with open('data_index.json', 'w', encoding='utf-8') as f:
f.write(json_index)

# 读取 JSON 文件（index 格式，以某字段为索引）
df_index = pd.read_json('data_index.json', orient='index')
print("Index 格式:")
print(df_index)
print()

# 示例数据：JSON 列格式（columns format）
json_columns = '''
{
"name": ["Tom", "Jerry", "Mike"],
"age": [28, 35, 42],
"city": ["Beijing", "Shanghai", "Guangzhou"],
"salary": [8000, 12000, 15000]
}
'''

with open('data_columns.json', 'w', encoding='utf-8') as f:
f.write(json_columns)

# 读取 JSON 文件（columns 格式）
df_columns = pd.read_json('data_columns.json', orient='columns')
print("Columns 格式:")
print(df_columns)

运行结果预期:

Records 格式:
    name  age       city  salary
0    Tom   28    Beijing    8000
1  Jerry   35   Shanghai   12000
2   Mike   42  Guangzhou   15000

Index 格式:
         age       city  salary
Tom       28    Beijing    8000
Jerry    35   Shanghai   12000
Mike     42  Guangzhou   15000

Columns 格式:
    name  age       city  salary
0    Tom   28    Beijing    8000
1  Jerry   参数          Shanghai   12000
2   Mike   42  Guangzhou   15000

代码解析:

orient='records'：JSON 数组格式，每行是一个 JSON 对象，是最常用的格式。
orient='index'：JSON 对象格式，键作为索引。
orient='columns'：JSON 列格式，键是列名，值是数组。
正确指定 orient 参数对于正确解析 JSON 数据至关重要。

示例 2：从字符串和 URL 读取 JSON

read_json() 不仅可以读取文件，还支持从字符串和 URL 读取数据。

实例

import pandas as pd
import json
from io import StringIO

# 示例 2a: 从 JSON 字符串读取
json_string = '''
[
{"product": "A", "sales": 100, "region": "North"},
{"product": "B", "sales": 200, "region": "South"},
{"product": "C", "sales": 150, "region": "East"}
]
'''

# 使用 StringIO 将字符串转为文件对象
df_from_string = pd.read_json(StringIO(json_string))
print("从字符串读取:")
print(df_from_string)
print()

# 也可以直接在参数中传入 JSON 字符串
# 注意：Python 字符串需要正确转义
json_str_direct = '[{"product": "A", "sales": 100}, {"product": "B", "sales": 200}]'
df_direct = pd.read_json(json_str_direct)
print("直接传入字符串:")
print(df_direct)
print()

# 示例 2b: 读取 JSON Lines 格式（每行一个 JSON 对象）
# JSON Lines 是常见的日志格式
json_lines = '''{"name": "Tom", "score": 85}
{"name": "Jerry", "score": 92}
{"name": "Mike", "score": 78}
{"name": "Lucy", "score": 95}'''

with open('data_lines.json', 'w', encoding='utf-8') as f:
f.write(json_lines)

# JSON Lines 格式需要逐行读取
# 可以使用 lines=True 参数（如果 JSONL 格式支持）
# 或者手动处理
df_list = []
with open('data_lines.json', 'r', encoding='utf-8') as f:
for line in f:
df_list.append(json.loads(line))

df_lines = pd.DataFrame(df_list)
print("JSON Lines 格式读取:")
print(df_lines)
print()

# 示例 2c: 从 API URL 读取（需要网络访问）
# 这里使用示例 API，实际使用时替换为真实 URL
# df_api = pd.read_json('https://api.example.com/data')
# print(df_api)
print("注意：从 URL 读取需要实际的网络请求")

运行结果预期:

从字符串读取:
  product  sales region
0       A    100   North
1       B    200   South
2       C    150    East

直接传入字符串:
  product  sales
0       A    100
1       B    200

JSON Lines 格式读取:
    name  score
0    Tom     85
1  Jerry     92
2    Mike     78
3    Lucy     95

注意：从 URL 读取需要实际的网络请求

代码解析:

read_json() 可以接受 JSON 字符串作为输入，使用 StringIO 将字符串转为类文件对象。
JSON Lines 格式是每行一个独立的 JSON 对象，常用于日志处理，需要逐行读取然后合并为 DataFrame。
从 URL 读取时，直接传入 URL 字符串即可，但需要网络支持。

示例 3：处理日期和类型转换

JSON 数据中的日期和数值类型需要特别处理。

实例

import pandas as pd

# 示例 3a: 处理日期字段
json_with_date = '''
[
{"name": "Tom", "birthday": "1995-03-15", "join_date": "2020-01-10"},
{"name": "Jerry", "birthday": "1988-07-22", "join_date": "2019-03-05"},
{"name": "Mike", "birthday": "1981-11-30", "join_date": "2018-06-20"}
]
'''

with open('data_with_date.json', 'w', encoding='utf-8') as f:
f.write(json_with_date)

# 默认会将日期字符串读取为 object 类型
df_date = pd.read_json('data_with_date.json')
print("默认读取（日期为字符串）:")
print(df_date)
print("birthday 类型:", df_date['birthday'].dtype)
print()

# 使用 convert_dates 自动转换日期列
df_date_converted = pd.read_json('data_with_date.json', convert_dates=['birthday', 'join_date'])
print("转换日期后:")
print(df_date_converted)
print("birthday 类型:", df_date_converted['birthday'].dtype)
print()

# 示例 3b: 指定数据类型
json_mixed = '''
[
{"id": "1", "name": "Tom", "score": 85.5},
{"id": "2", "name": "Jerry", "score": 92.0},
{"id": "3", "name": "Mike", "score": 78.5}
]
'''

with open('data_mixed.json', 'w', encoding='utf-8') as f:
f.write(json_mixed)

# 默认会将 id 读为整数，name 读为字符串，score 读为浮点数
df_mixed = pd.read_json('data_mixed.json')
print("默认类型推断:")
print(df_mixed)
print("id 类型:", df_mixed['id'].dtype)
print()

# 使用 dtype 显式指定类型
df_typed = pd.read_json('data_mixed.json', dtype={'id': str, 'score': float})
print("指定类型后:")
print(df_typed)
print("id 类型:", df_typed['id'].dtype)

运行结果预期:

默认读取（日期为字符串）:
    name    birthday    join_date
0    Tom  1995-03-15  2020-01-10
1  Jerry  1988-07-22  2019-03-05
2    Mike  1981-11-30  2018-06-20

默认读取（日期为字符串）:
    name    birthday    join_date
0    汤    Tom  1995-03-15  pandas 的 to_json() 和 read_json() 的完整配对示例
1  Jerry  1988-07-22  2019-03-05
2    Mike  1981-11-30  确保 JSON 数据格式一致
    1981-配对    to_json(orient='records')  配对    read_json(orient='records')
3   Mike  1981-11-30  to_json(orient='records')
...
    join_date    2018-06-20

birthday 类型: object

转换日期后:
birthday 类型: datetime64[ns]

代码解析:

convert_dates 参数可以指定哪些列需要转换为日期类型。
默认情况下，convert_dates=True 会自动识别常见的日期格式。
dtype 参数可以显式指定每列的数据类型，避免类型推断错误。

注意事项

正确指定 orient 参数是读取 JSON 数据的关键，不同的 JSON 结构需要不同的 orient 值。
JSON 中的日期默认被读取为字符串，需要使用 convert_dates 参数转换为日期类型。
读取大型 JSON 文件时，可以考虑使用 chunksize 参数分块读取。
read_json() 支持从文件路径、URL 和 JSON 字符串读取。
JSON Lines 格式需要逐行解析后合并为 DataFrame。

小结

read_json() 是 pandas 中读取 JSON 数据的核心函数，支持多种 JSON 格式。JSON 作为 Web API 和数据交换的常用格式，在实际数据分析工作中应用广泛。

掌握 read_json() 的关键是理解不同的 orient 格式，以及日期和类型的处理方法。建议读者在实际工作中多练习不同格式 JSON 数据的读取。

Python math 模块 Pandas 常用函数

返回顶部

菜鸟教程

Pandas pd.read_json() 函数

基本语法与参数

语法格式

参数说明

返回值

实例

示例 1：读取不同格式的 JSON 数据

实例

示例 2：从字符串和 URL 读取 JSON

实例

示例 3：处理日期和类型转换

实例

注意事项

小结