o3

Name: o3
Price: 2 USD
Rating: 65.3 (19 reviews)
Author: OpenAI

Multimodal

Zero-eval

#1ARC-AGI

#1MathVista

#1Tau-bench

+11 more

by OpenAI

About

o3 is a multimodal language model developed by OpenAI. It achieves strong performance with an average score of 65.3% across 19 benchmarks. It excels particularly in AIME 2024 (91.6%), ARC-AGI (88.0%), MathVista (86.8%). The model shows particular specialization in vision tasks with an average performance of 76.7%. It supports a 300K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents OpenAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$2.00 -$2.00

Output (per 1M)$8.00 -$8.00

Providers1

Timeline

AnnouncedApr 16, 2025

ReleasedApr 16, 2025

Knowledge CutoffMay 31, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

19 benchmarks

Average Score

65.3%

Best Score

91.6%

High Performers (80%+)

Performance Metrics

Max Context Window

300.0K

Avg Throughput

50.0 tok/s

Avg Latency

20ms

Top Categories

vision

76.7%

general

67.5%

agents

63.0%

math

51.3%

reasoning

47.3%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

AIME 2024

Rank #5 of 41

#2Gemini 2.5 Pro

92.0%

#3Grok-3

93.3%

#4o4-mini

93.4%

#5o3

91.6%

#6DeepSeek-R1-0528

91.4%

#7Gemini 2.5 Flash

88.0%

#8o3-mini

87.3%

ARC-AGI

Rank #1 of 2

#1o3

88.0%

#2Qwen3-235B-A22B-Instruct-2507

41.8%

MathVista

Rank #1 of 35

#1o3

86.8%

#2o4-mini

84.3%

#3Kimi-k1.5

74.9%

#4Llama 4 Maverick

73.7%

AIME 2025

Rank #10 of 36

#7DeepSeek-R1-0528

87.5%

#8Gemini 2.5 Pro Preview 06-05

88.0%

#9Grok-3 Mini

90.8%

#10o3

86.4%

#11GPT-5 nano

85.2%

#12Gemini 2.5 Pro

83.0%

#13Qwen3 235B A22B

81.5%

VideoMMMU

Rank #3 of 3

#1Gemini 2.5 Pro Preview 06-05

83.6%

#2GPT-5

84.6%

#3o3

83.3%

All Benchmark Results for o3

Complete list of benchmark scores with detailed information


AIME 2024 AIME 2024 benchmark	general	text	0.92	91.6%	Self-reported
ARC-AGI ARC-AGI benchmark	reasoning	text	0.88	88.0%	Self-reported
MathVista MathVista benchmark	math	text	0.87	86.8%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.86	86.4%	Self-reported
VideoMMMU VideoMMMU benchmark	vision	multimodal	0.83	83.3%	Self-reported
GPQA GPQA benchmark	general	text	0.83	83.3%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.83	82.9%	Self-reported
Aider-Polyglot Aider-Polyglot benchmark	general	text	0.81	81.3%	Self-reported
CharXiv-R CharXiv-R benchmark	general	text	0.79	78.6%	Self-reported
MMMU-Pro MMMU-Pro benchmark	vision	multimodal	0.76	76.4%	Self-reported

Showing 1 to 10 of 19 benchmarks

Resources

API Reference Research Paper