OpenAI

o1-preview

Zero-eval

by OpenAI

About

o1-preview is a language model developed by OpenAI. It achieves strong performance with an average score of 64.8% across 8 benchmarks. It excels particularly in MGSM (90.8%), MMLU (90.8%), MATH (85.5%). The model shows particular specialization in math tasks with an average performance of 88.1%. It supports a 161K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.

Pricing Range
Input (per 1M)$15.00 -$16.50
Output (per 1M)$60.00 -$66.00
Providers2
Timeline
AnnouncedSep 12, 2024
ReleasedSep 12, 2024
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

8 benchmarks
Average Score
64.8%
Best Score
90.8%
High Performers (80%+)
3

Performance Metrics

Max Context Window
160.8K
Avg Throughput
41.0 tok/s
Avg Latency
8ms

Top Categories

math
88.1%
general
58.0%
roleplay
52.3%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

MGSM

Rank #6 of 31
#3Llama 3.3 70B Instruct
91.1%
#4Claude 3.5 Sonnet
91.6%
#5Claude 3.5 Sonnet
91.6%
#6o1-preview
90.8%
#7Claude 3 Opus
90.7%
#8Llama 4 Scout
90.6%
#9GPT-4o
90.5%

MMLU

Rank #4 of 78
#1GPT-4.5
90.8%
#2o1
91.8%
#3GPT-5
92.5%
#4o1-preview
90.8%
#5DeepSeek-R1
90.8%
#6Claude 3.5 Sonnet
90.4%
#7Claude 3.5 Sonnet
90.4%

MATH

Rank #7 of 63
#4Gemini 1.5 Pro
86.5%
#5Gemini 2.0 Flash-Lite
86.8%
#6Gemma 3 27B
89.0%
#7o1-preview
85.5%
#8GPT-5
84.7%
#9Gemma 3 12B
83.8%
#10Qwen2.5 72B Instruct
83.1%

GPQA

Rank #24 of 115
#21DeepSeek R1 Zero
73.3%
#22Gemini 2.0 Flash Thinking
74.2%
#23Kimi K2 Instruct
75.1%
#24o1-preview
73.3%
#25GPT OSS 120B
71.5%
#26DeepSeek-R1
71.5%
#27GPT-5 nano
71.2%

LiveBench

Rank #8 of 12
#5o1
67.0%
#6QwQ-32B
73.1%
#7Qwen3 30B A3B
74.3%
#8o1-preview
52.3%
#9Qwen2.5 72B Instruct
52.3%
#10Phi 4
47.6%
#11Qwen2.5 7B Instruct
35.9%
All Benchmark Results for o1-preview
Complete list of benchmark scores with detailed information
MGSM
MGSM benchmark
math
text
0.91
90.8%
Self-reported
MMLU
MMLU benchmark
general
text
0.91
90.8%
Self-reported
MATH
MATH benchmark
math
text
0.85
85.5%
Self-reported
GPQA
GPQA benchmark
general
text
0.73
73.3%
Self-reported
LiveBench
LiveBench benchmark
roleplay
text
0.52
52.3%
Self-reported
SimpleQA
SimpleQA benchmark
general
text
0.42
42.4%
Self-reported
AIME 2024
AIME 2024 benchmark
general
text
0.42
42.0%
Self-reported
SWE-Bench Verified
SWE-Bench Verified benchmark
general
text
0.41
41.3%
Self-reported