
o1-mini
Zero-eval
#1SuperGLUE
#1Cybersecurity CTFs
by OpenAI
About
o1-mini is a language model developed by OpenAI. It achieves strong performance with an average score of 71.9% across 6 benchmarks. It excels particularly in HumanEval (92.4%), MATH-500 (90.0%), MMLU (85.2%). It supports a 194K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.
Pricing Range
Input (per 1M)$3.00 -$3.30
Output (per 1M)$12.00 -$13.20
Providers2
Timeline
AnnouncedSep 12, 2024
ReleasedSep 12, 2024
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
6 benchmarks
Average Score
71.9%
Best Score
92.4%
High Performers (80%+)
3Performance Metrics
Max Context Window
193.5KAvg Throughput
107.5 tok/sAvg Latency
3msTop Categories
code
92.4%
math
90.0%
language
75.0%
general
58.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
HumanEval
Rank #5 of 62
#2Qwen2.5-Coder 32B Instruct
92.7%
#3Kimi K2 Instruct
93.3%
#4GPT-5
93.4%
#5o1-mini
92.4%
#6Claude 3.5 Sonnet
92.0%
#7Mistral Large 2
92.0%
#8Qwen2.5 VL 32B Instruct
91.5%
MATH-500
Rank #18 of 22
#15DeepSeek-V3
90.2%
#16QwQ-32B-Preview
90.6%
#17QwQ-32B
90.6%
#18o1-mini
90.0%
#19DeepSeek R1 Distill Llama 8B
89.1%
#20DeepSeek R1 Distill Qwen 1.5B
83.9%
#21Granite 3.3 8B Base
69.0%
MMLU
Rank #29 of 78
#26Llama 4 Maverick
85.5%
#27GPT-4o
85.7%
#28Nova Pro
85.9%
#29o1-mini
85.2%
#30Phi 4
84.8%
#31Mistral Large 2
84.0%
#32Llama 3.1 70B Instruct
83.6%
SuperGLUE
Rank #1 of 1
#1o1-mini
75.0%
GPQA
Rank #47 of 115
#44Gemini 2.0 Flash
62.1%
#45DeepSeek R1 Distill Qwen 32B
62.1%
#46Gemini 2.5 Flash-Lite
64.6%
#47o1-mini
60.0%
#48Claude 3.5 Sonnet
59.4%
#49DeepSeek-V3
59.1%
#50DeepSeek R1 Distill Qwen 14B
59.1%
All Benchmark Results for o1-mini
Complete list of benchmark scores with detailed information
HumanEval HumanEval benchmark | code | text | 0.92 | 92.4% | Self-reported |
MATH-500 MATH-500 benchmark | math | text | 0.90 | 90.0% | Self-reported |
MMLU MMLU benchmark | general | text | 0.85 | 85.2% | Self-reported |
SuperGLUE SuperGLUE benchmark | language | text | 0.75 | 75.0% | Self-reported |
GPQA GPQA benchmark | general | text | 0.60 | 60.0% | Self-reported |
Cybersecurity CTFs Cybersecurity CTFs benchmark | general | text | 0.29 | 28.7% | Self-reported |