o3-mini

Name: o3-mini
Price: 1.1 USD
Rating: 56.9 (26 reviews)
Author: OpenAI

Zero-eval

#1MATH

#1IFEval

#1LiveBench

+9 more

by OpenAI

About

o3-mini is a language model developed by OpenAI. The model shows competitive results across 26 benchmarks. It excels particularly in COLLIE (98.7%), MATH (97.9%), IFEval (93.9%). It supports a 300K token context window for handling large documents. The model is available through 2 API providers. Released in 2025, it represents OpenAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$1.10 -$1.10

Output (per 1M)$4.40 -$4.40

Providers2

Timeline

AnnouncedJan 30, 2025

ReleasedJan 30, 2025

Knowledge CutoffSep 30, 2023

Specifications

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

26 benchmarks

Average Score

56.9%

Best Score

98.7%

High Performers (80%+)

Performance Metrics

Max Context Window

300.0K

Avg Throughput

115.0 tok/s

Avg Latency

5ms

Top Categories

code

93.9%

roleplay

84.6%

math

66.4%

general

55.2%

agents

45.0%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

COLLIE

Rank #2 of 7

#1GPT-5

99.0%

#2o3-mini

98.7%

#3GPT-4.5

72.3%

#4GPT-4.1

65.8%

#5GPT-4o

61.0%

MATH

Rank #1 of 63

#1o3-mini

97.9%

#2o1

96.4%

#3Gemini 2.0 Flash

89.7%

#4Gemma 3 27B

89.0%

IFEval

Rank #1 of 37

#1o3-mini

93.9%

#2Claude 3.7 Sonnet

93.2%

#3Nova Pro

92.1%

#4Llama 3.3 70B Instruct

92.1%

MGSM

Rank #2 of 31

#1Llama 4 Maverick

92.3%

#2o3-mini

92.0%

#3Claude 3.5 Sonnet

91.6%

#4Claude 3.5 Sonnet

91.6%

#5Llama 3.3 70B Instruct

91.1%

AIME 2024

Rank #8 of 41

#5Gemini 2.5 Flash

88.0%

#6DeepSeek-R1-0528

91.4%

#7o3

91.6%

#8o3-mini

87.3%

#9DeepSeek R1 Distill Llama 70B

86.7%

#10DeepSeek R1 Zero

86.7%

#11o1-pro

86.0%

All Benchmark Results for o3-mini

Complete list of benchmark scores with detailed information


COLLIE COLLIE benchmark	general	text	0.99	98.7%	Self-reported
MATH MATH benchmark	math	text	0.98	97.9%	Self-reported
IFEval IFEval benchmark	code	text	0.94	93.9%	Self-reported
MGSM MGSM benchmark	math	text	0.92	92.0%	Self-reported
AIME 2024 AIME 2024 benchmark	general	text	0.87	87.3%	Self-reported
MMLU MMLU benchmark	general	text	0.87	86.9%	Self-reported
LiveBench LiveBench benchmark	roleplay	text	0.85	84.6%	Self-reported
Multilingual MMLU Multilingual MMLU benchmark	general	text	0.81	80.7%	Self-reported
Multi-IF Multi-IF benchmark	general	text	0.80	79.5%	Self-reported
GPQA GPQA benchmark	general	text	0.77	77.2%	Self-reported

Showing 1 to 10 of 26 benchmarks

Resources

API Reference Research Paper Blog Post Repository