
o1-preview
Zero-eval
by OpenAI
About
o1-preview is a language model developed by OpenAI. It achieves strong performance with an average score of 64.8% across 8 benchmarks. It excels particularly in MGSM (90.8%), MMLU (90.8%), MATH (85.5%). The model shows particular specialization in math tasks with an average performance of 88.1%. It supports a 161K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.
Pricing Range
Input (per 1M)$15.00 -$16.50
Output (per 1M)$60.00 -$66.00
Providers2
Timeline
AnnouncedSep 12, 2024
ReleasedSep 12, 2024
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
8 benchmarks
Average Score
64.8%
Best Score
90.8%
High Performers (80%+)
3Performance Metrics
Max Context Window
160.8KAvg Throughput
41.0 tok/sAvg Latency
8msTop Categories
math
88.1%
general
58.0%
roleplay
52.3%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MGSM
Rank #6 of 31
#3Llama 3.3 70B Instruct
91.1%
#4Claude 3.5 Sonnet
91.6%
#5Claude 3.5 Sonnet
91.6%
#6o1-preview
90.8%
#7Claude 3 Opus
90.7%
#8Llama 4 Scout
90.6%
#9GPT-4o
90.5%
MMLU
Rank #4 of 78
#1GPT-4.5
90.8%
#2o1
91.8%
#3GPT-5
92.5%
#4o1-preview
90.8%
#5DeepSeek-R1
90.8%
#6Claude 3.5 Sonnet
90.4%
#7Claude 3.5 Sonnet
90.4%
MATH
Rank #7 of 63
#4Gemini 1.5 Pro
86.5%
#5Gemini 2.0 Flash-Lite
86.8%
#6Gemma 3 27B
89.0%
#7o1-preview
85.5%
#8GPT-5
84.7%
#9Gemma 3 12B
83.8%
#10Qwen2.5 72B Instruct
83.1%
GPQA
Rank #24 of 115
#21DeepSeek R1 Zero
73.3%
#22Gemini 2.0 Flash Thinking
74.2%
#23Kimi K2 Instruct
75.1%
#24o1-preview
73.3%
#25GPT OSS 120B
71.5%
#26DeepSeek-R1
71.5%
#27GPT-5 nano
71.2%
LiveBench
Rank #8 of 12
#5o1
67.0%
#6QwQ-32B
73.1%
#7Qwen3 30B A3B
74.3%
#8o1-preview
52.3%
#9Qwen2.5 72B Instruct
52.3%
#10Phi 4
47.6%
#11Qwen2.5 7B Instruct
35.9%
All Benchmark Results for o1-preview
Complete list of benchmark scores with detailed information
MGSM MGSM benchmark | math | text | 0.91 | 90.8% | Self-reported |
MMLU MMLU benchmark | general | text | 0.91 | 90.8% | Self-reported |
MATH MATH benchmark | math | text | 0.85 | 85.5% | Self-reported |
GPQA GPQA benchmark | general | text | 0.73 | 73.3% | Self-reported |
LiveBench LiveBench benchmark | roleplay | text | 0.52 | 52.3% | Self-reported |
SimpleQA SimpleQA benchmark | general | text | 0.42 | 42.4% | Self-reported |
AIME 2024 AIME 2024 benchmark | general | text | 0.42 | 42.0% | Self-reported |
SWE-Bench Verified SWE-Bench Verified benchmark | general | text | 0.41 | 41.3% | Self-reported |