o1-preview

Name: o1-preview
Price: 16.5 USD
Rating: 64.8 (8 reviews)
Author: OpenAI

Zero-eval

by OpenAI

About

o1-preview is a language model developed by OpenAI. It achieves strong performance with an average score of 64.8% across 8 benchmarks. It excels particularly in MGSM (90.8%), MMLU (90.8%), MATH (85.5%). The model shows particular specialization in math tasks with an average performance of 88.1%. It supports a 161K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$15.00 -$16.50

Output (per 1M)$60.00 -$66.00

Providers2

Timeline

AnnouncedSep 12, 2024

ReleasedSep 12, 2024

Specifications

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

8 benchmarks

Average Score

64.8%

Best Score

90.8%

High Performers (80%+)

Performance Metrics

Max Context Window

160.8K

Avg Throughput

41.0 tok/s

Avg Latency

8ms

Top Categories

math

88.1%

general

58.0%

roleplay

52.3%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MGSM

Rank #6 of 31

#3Llama 3.3 70B Instruct

91.1%

#4Claude 3.5 Sonnet

91.6%

#5Claude 3.5 Sonnet

91.6%

#6o1-preview

90.8%

#7Claude 3 Opus

90.7%

#8Llama 4 Scout

90.6%

#9GPT-4o

90.5%

MMLU

Rank #4 of 78

#1GPT-4.5

90.8%

#2o1

91.8%

#3GPT-5

92.5%

#4o1-preview

90.8%

#5DeepSeek-R1

90.8%

#6Claude 3.5 Sonnet

90.4%

#7Claude 3.5 Sonnet

90.4%

MATH

Rank #7 of 63

#4Gemini 1.5 Pro

86.5%

#5Gemini 2.0 Flash-Lite

86.8%

#6Gemma 3 27B

89.0%

#7o1-preview

85.5%

#8GPT-5

84.7%

#9Gemma 3 12B

83.8%

#10Qwen2.5 72B Instruct

83.1%

GPQA

Rank #24 of 115

#21DeepSeek R1 Zero

73.3%

#22Gemini 2.0 Flash Thinking

74.2%

#23Kimi K2 Instruct

75.1%

#24o1-preview

73.3%

#25GPT OSS 120B

71.5%

#26DeepSeek-R1

71.5%

#27GPT-5 nano

71.2%

LiveBench

Rank #8 of 12

#5o1

67.0%

#6QwQ-32B

73.1%

#7Qwen3 30B A3B

74.3%

#8o1-preview

52.3%

#9Qwen2.5 72B Instruct

52.3%

#10Phi 4

47.6%

#11Qwen2.5 7B Instruct

35.9%

All Benchmark Results for o1-preview

Complete list of benchmark scores with detailed information


MGSM MGSM benchmark	math	text	0.91	90.8%	Self-reported
MMLU MMLU benchmark	general	text	0.91	90.8%	Self-reported
MATH MATH benchmark	math	text	0.85	85.5%	Self-reported
GPQA GPQA benchmark	general	text	0.73	73.3%	Self-reported
LiveBench LiveBench benchmark	roleplay	text	0.52	52.3%	Self-reported
SimpleQA SimpleQA benchmark	general	text	0.42	42.4%	Self-reported
AIME 2024 AIME 2024 benchmark	general	text	0.42	42.0%	Self-reported
SWE-Bench Verified SWE-Bench Verified benchmark	general	text	0.41	41.3%	Self-reported

Resources

API Reference Research Paper Blog Post Repository