OpenAI

o1-mini

Zero-eval
#1SuperGLUE
#1Cybersecurity CTFs

by OpenAI

About

o1-mini is a language model developed by OpenAI. It achieves strong performance with an average score of 71.9% across 6 benchmarks. It excels particularly in HumanEval (92.4%), MATH-500 (90.0%), MMLU (85.2%). It supports a 194K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.

Pricing Range
Input (per 1M)$3.00 -$3.30
Output (per 1M)$12.00 -$13.20
Providers2
Timeline
AnnouncedSep 12, 2024
ReleasedSep 12, 2024
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

6 benchmarks
Average Score
71.9%
Best Score
92.4%
High Performers (80%+)
3

Performance Metrics

Max Context Window
193.5K
Avg Throughput
107.5 tok/s
Avg Latency
3ms

Top Categories

code
92.4%
math
90.0%
language
75.0%
general
58.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

HumanEval

Rank #5 of 62
#2Qwen2.5-Coder 32B Instruct
92.7%
#3Kimi K2 Instruct
93.3%
#4GPT-5
93.4%
#5o1-mini
92.4%
#6Claude 3.5 Sonnet
92.0%
#7Mistral Large 2
92.0%
#8Qwen2.5 VL 32B Instruct
91.5%

MATH-500

Rank #18 of 22
#15DeepSeek-V3
90.2%
#16QwQ-32B-Preview
90.6%
#17QwQ-32B
90.6%
#18o1-mini
90.0%
#19DeepSeek R1 Distill Llama 8B
89.1%
#20DeepSeek R1 Distill Qwen 1.5B
83.9%
#21Granite 3.3 8B Base
69.0%

MMLU

Rank #29 of 78
#26Llama 4 Maverick
85.5%
#27GPT-4o
85.7%
#28Nova Pro
85.9%
#29o1-mini
85.2%
#30Phi 4
84.8%
#31Mistral Large 2
84.0%
#32Llama 3.1 70B Instruct
83.6%

SuperGLUE

Rank #1 of 1
#1o1-mini
75.0%

GPQA

Rank #47 of 115
#44Gemini 2.0 Flash
62.1%
#45DeepSeek R1 Distill Qwen 32B
62.1%
#46Gemini 2.5 Flash-Lite
64.6%
#47o1-mini
60.0%
#48Claude 3.5 Sonnet
59.4%
#49DeepSeek-V3
59.1%
#50DeepSeek R1 Distill Qwen 14B
59.1%
All Benchmark Results for o1-mini
Complete list of benchmark scores with detailed information
HumanEval
HumanEval benchmark
code
text
0.92
92.4%
Self-reported
MATH-500
MATH-500 benchmark
math
text
0.90
90.0%
Self-reported
MMLU
MMLU benchmark
general
text
0.85
85.2%
Self-reported
SuperGLUE
SuperGLUE benchmark
language
text
0.75
75.0%
Self-reported
GPQA
GPQA benchmark
general
text
0.60
60.0%
Self-reported
Cybersecurity CTFs
Cybersecurity CTFs benchmark
general
text
0.29
28.7%
Self-reported