DeepSeek

DeepSeek-R1

Zero-eval
#1CLUEWSC
#1DROP
#1AlpacaEval 2.0
+7 more

by DeepSeek

About

DeepSeek-R1 is a language model developed by DeepSeek. It achieves strong performance with an average score of 74.1% across 20 benchmarks. It excels particularly in MATH-500 (97.3%), MMLU-Redux (92.9%), CLUEWSC (92.8%). It supports a 262K token context window for handling large documents. The model is available through 4 API providers. Released in 2025, it represents DeepSeek's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.55 -$8.00
Output (per 1M)$2.19 -$8.00
Providers4
Timeline
AnnouncedJan 20, 2025
ReleasedJan 20, 2025
Specifications
Training Tokens14.8T
License & Family
License
MIT License
Base ModelDeepSeek-V3
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

20 benchmarks
Average Score
74.1%
Best Score
97.3%
High Performers (80%+)
11

Performance Metrics

Max Context Window
262.1K
Avg Throughput
4.0 tok/s
Avg Latency
0ms

Top Categories

math
97.3%
code
82.1%
general
75.3%
reasoning
1.3%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

MATH-500

Rank #2 of 22
#1Kimi K2 Instruct
97.4%
#2DeepSeek-R1
97.3%
#3Llama 3.1 Nemotron Ultra 253B v1
97.0%
#4Llama-3.3 Nemotron Super 49B v1
96.6%
#5Claude 3.7 Sonnet
96.2%

MMLU-Redux

Rank #3 of 13
#1Qwen3-235B-A22B-Instruct-2507
93.1%
#2DeepSeek-R1-0528
93.4%
#3DeepSeek-R1
92.9%
#4Kimi K2 Instruct
92.7%
#5DeepSeek-V3
89.1%
#6Qwen3 235B A22B
87.4%

CLUEWSC

Rank #1 of 3
#1DeepSeek-R1
92.8%
#2Kimi-k1.5
91.4%
#3DeepSeek-V3
90.9%

Arena Hard

Rank #3 of 22
#1Qwen3 32B
93.8%
#2Qwen3 235B A22B
95.6%
#3DeepSeek-R1
92.3%
#4Qwen3 30B A3B
91.0%
#5Llama-3.3 Nemotron Super 49B v1
88.3%
#6Mistral Small 3 24B Instruct
87.6%

DROP

Rank #1 of 28
#1DeepSeek-R1
92.2%
#2DeepSeek-V3
91.6%
#3Claude 3.5 Sonnet
87.1%
#4Claude 3.5 Sonnet
87.1%
All Benchmark Results for DeepSeek-R1
Complete list of benchmark scores with detailed information
MATH-500
MATH-500 benchmark
math
text
0.97
97.3%
Self-reported
MMLU-Redux
MMLU-Redux benchmark
general
text
0.93
92.9%
Self-reported
CLUEWSC
CLUEWSC benchmark
general
text
0.93
92.8%
Self-reported
Arena Hard
Arena Hard benchmark
general
text
0.92
92.3%
Self-reported
DROP
DROP benchmark
general
text
0.92
92.2%
Self-reported
C-Eval
C-Eval benchmark
code
text
0.92
91.8%
Self-reported
MMLU
MMLU benchmark
general
text
0.91
90.8%
Self-reported
AlpacaEval 2.0
AlpacaEval 2.0 benchmark
code
text
0.88
87.6%
Self-reported
MMLU-Pro
MMLU-Pro benchmark
general
text
0.84
84.0%
Self-reported
IFEval
IFEval benchmark
code
text
0.83
83.3%
Self-reported
Showing 1 to 10 of 20 benchmarks