o1

Name: o1
Price: 15 USD
Rating: 71.6 (19 reviews)
Author: OpenAI

Zero-eval

#1GPQA Physics

#1GPQA Biology

#1GPQA Chemistry

+4 more

by OpenAI

About

o1 is a language model developed by OpenAI. It achieves strong performance with an average score of 71.6% across 19 benchmarks. It excels particularly in GSM8k (97.1%), MATH (96.4%), GPQA Physics (92.8%). It supports a 300K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$15.00 -$15.00

Output (per 1M)$60.00 -$60.00

Providers2

Timeline

AnnouncedDec 17, 2024

ReleasedDec 17, 2024

Specifications

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

19 benchmarks

Average Score

71.6%

Best Score

97.1%

High Performers (80%+)

Performance Metrics

Max Context Window

300.0K

Avg Throughput

41.0 tok/s

Avg Latency

8ms

Top Categories

code

88.1%

vision

77.6%

math

72.0%

general

71.8%

roleplay

67.0%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #2 of 46

#1Kimi K2 Instruct

97.3%

#2o1

97.1%

#3GPT-4.5

97.0%

#4Llama 3.1 405B Instruct

96.8%

#5Claude 3.5 Sonnet

96.4%

MATH

Rank #2 of 63

#1o3-mini

97.9%

#2o1

96.4%

#3Gemini 2.0 Flash

89.7%

#4Gemma 3 27B

89.0%

#5Gemini 2.0 Flash-Lite

86.8%

GPQA Physics

Rank #1 of 1

#1o1

92.8%

MMLU

Rank #2 of 78

#1GPT-5

92.5%

#2o1

91.8%

#3GPT-4.5

90.8%

#4o1-preview

90.8%

#5DeepSeek-R1

90.8%

MGSM

Rank #10 of 31

#7GPT-4o

90.5%

#8Llama 4 Scout

90.6%

#9Claude 3 Opus

90.7%

#10o1

89.3%

#11GPT-4 Turbo

88.5%

#12Gemini 1.5 Pro

87.5%

#13GPT-4o mini

87.0%

All Benchmark Results for o1

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.97	97.1%	Self-reported
MATH MATH benchmark	math	text	0.96	96.4%	Self-reported
GPQA Physics GPQA Physics benchmark	general	text	0.93	92.8%	Self-reported
MMLU MMLU benchmark	general	text	0.92	91.8%	Self-reported
MGSM MGSM benchmark	math	text	0.89	89.3%	Self-reported
HumanEval HumanEval benchmark	code	text	0.88	88.1%	Self-reported
MMMLU MMMLU benchmark	general	text	0.88	87.7%	Self-reported
GPQA GPQA benchmark	general	text	0.78	78.0%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.78	77.6%	Self-reported
AIME 2024 AIME 2024 benchmark	general	text	0.74	74.3%	Self-reported

Showing 1 to 10 of 19 benchmarks

Resources

API Reference Research Paper Blog Post Repository