o1-mini

Name: o1-mini
Price: 3.3 USD
Rating: 71.9 (6 reviews)
Author: OpenAI

Zero-eval

#1SuperGLUE

#1Cybersecurity CTFs

by OpenAI

About

o1-mini is a language model developed by OpenAI. It achieves strong performance with an average score of 71.9% across 6 benchmarks. It excels particularly in HumanEval (92.4%), MATH-500 (90.0%), MMLU (85.2%). It supports a 194K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$3.00 -$3.30

Output (per 1M)$12.00 -$13.20

Providers2

Timeline

AnnouncedSep 12, 2024

ReleasedSep 12, 2024

Specifications

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

6 benchmarks

Average Score

71.9%

Best Score

92.4%

High Performers (80%+)

Performance Metrics

Max Context Window

193.5K

Avg Throughput

107.5 tok/s

Avg Latency

3ms

Top Categories

code

92.4%

math

90.0%

language

75.0%

general

58.0%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

HumanEval

Rank #5 of 62

#2Qwen2.5-Coder 32B Instruct

92.7%

#3Kimi K2 Instruct

93.3%

#4GPT-5

93.4%

#5o1-mini

92.4%

#6Claude 3.5 Sonnet

92.0%

#7Mistral Large 2

92.0%

#8Qwen2.5 VL 32B Instruct

91.5%

MATH-500

Rank #18 of 22

#15DeepSeek-V3

90.2%

#16QwQ-32B-Preview

90.6%

#17QwQ-32B

90.6%

#18o1-mini

90.0%

#19DeepSeek R1 Distill Llama 8B

89.1%

#20DeepSeek R1 Distill Qwen 1.5B

83.9%

#21Granite 3.3 8B Base

69.0%

MMLU

Rank #29 of 78

#26Llama 4 Maverick

85.5%

#27GPT-4o

85.7%

#28Nova Pro

85.9%

#29o1-mini

85.2%

#30Phi 4

84.8%

#31Mistral Large 2

84.0%

#32Llama 3.1 70B Instruct

83.6%

SuperGLUE

Rank #1 of 1

#1o1-mini

75.0%

GPQA

Rank #47 of 115

#44Gemini 2.0 Flash

62.1%

#45DeepSeek R1 Distill Qwen 32B

62.1%

#46Gemini 2.5 Flash-Lite

64.6%

#47o1-mini

60.0%

#48Claude 3.5 Sonnet

59.4%

#49DeepSeek-V3

59.1%

#50DeepSeek R1 Distill Qwen 14B

59.1%

All Benchmark Results for o1-mini

Complete list of benchmark scores with detailed information


HumanEval HumanEval benchmark	code	text	0.92	92.4%	Self-reported
MATH-500 MATH-500 benchmark	math	text	0.90	90.0%	Self-reported
MMLU MMLU benchmark	general	text	0.85	85.2%	Self-reported
SuperGLUE SuperGLUE benchmark	language	text	0.75	75.0%	Self-reported
GPQA GPQA benchmark	general	text	0.60	60.0%	Self-reported
Cybersecurity CTFs Cybersecurity CTFs benchmark	general	text	0.29	28.7%	Self-reported

Resources

API Reference Playground Research Paper Blog Post