o4-mini

Name: o4-mini
Rating: 66.5 (14 reviews)
Author: OpenAI

Multimodal

Zero-eval

#2AIME 2024

#2MathVista

#2BrowseComp

+2 more

by OpenAI

About

o4-mini is a multimodal language model developed by OpenAI. It achieves strong performance with an average score of 66.5% across 14 benchmarks. It excels particularly in AIME 2024 (93.4%), AIME 2025 (92.7%), MathVista (84.3%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents OpenAI's latest advancement in AI technology.

Timeline

AnnouncedApr 16, 2025

ReleasedApr 16, 2025

Knowledge CutoffMay 31, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

14 benchmarks

Average Score

66.5%

Best Score

93.4%

High Performers (80%+)

Top Categories

math

84.3%

vision

81.6%

general

64.4%

agents

60.5%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

AIME 2024

Rank #2 of 41

#1Grok-3 Mini

95.8%

#2o4-mini

93.4%

#3Grok-3

93.3%

#4Gemini 2.5 Pro

92.0%

#5o3

91.6%

AIME 2025

Rank #4 of 36

#1Grok-3

93.3%

#2GPT-5

94.6%

#3Grok-4 Heavy

100.0%

#4o4-mini

92.7%

#5Grok-4

91.7%

#6GPT-5 mini

91.1%

#7Grok-3 Mini

90.8%

MathVista

Rank #2 of 35

#1o3

86.8%

#2o4-mini

84.3%

#3Kimi-k1.5

74.9%

#4Llama 4 Maverick

73.7%

#5GPT-4.1 mini

73.1%

MMMU

Rank #4 of 52

#1Gemini 2.5 Pro Preview 06-05

82.0%

#2o3

82.9%

#3GPT-5

84.2%

#4o4-mini

81.6%

#5Gemini 2.5 Flash

79.7%

#6Gemini 2.5 Pro

79.6%

#7Grok-3

78.0%

GPQA

Rank #12 of 115

#9GPT-5 mini

82.3%

#10Gemini 2.5 Flash

82.8%

#11Gemini 2.5 Pro

83.0%

#12o4-mini

81.4%

#13DeepSeek-R1-0528

81.0%

#14Claude Opus 4

79.6%

#15o1-pro

79.0%

All Benchmark Results for o4-mini

Complete list of benchmark scores with detailed information


AIME 2024 AIME 2024 benchmark	general	text	0.93	93.4%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.93	92.7%	Self-reported
MathVista MathVista benchmark	math	text	0.84	84.3%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.82	81.6%	Self-reported
GPQA GPQA benchmark	general	text	0.81	81.4%	Self-reported
CharXiv-R CharXiv-R benchmark	general	text	0.72	72.0%	Self-reported
TAU-bench Retail TAU-bench Retail benchmark	agents	text	0.72	71.8%	Self-reported
Aider-Polyglot Aider-Polyglot benchmark	general	text	0.69	68.9%	Self-reported
SWE-Bench Verified SWE-Bench Verified benchmark	general	text	0.68	68.1%	Self-reported
Aider-Polyglot Edit Aider-Polyglot Edit benchmark	general	text	0.58	58.2%	Self-reported

Showing 1 to 10 of 14 benchmarks

Resources

API Reference Research Paper Repository