
DeepSeek-R1
Zero-eval
#1CLUEWSC
#1DROP
#1AlpacaEval 2.0
+7 more
by DeepSeek
About
DeepSeek-R1 is a language model developed by DeepSeek. It achieves strong performance with an average score of 74.1% across 20 benchmarks. It excels particularly in MATH-500 (97.3%), MMLU-Redux (92.9%), CLUEWSC (92.8%). It supports a 262K token context window for handling large documents. The model is available through 4 API providers. Released in 2025, it represents DeepSeek's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.55 -$8.00
Output (per 1M)$2.19 -$8.00
Providers4
Timeline
AnnouncedJan 20, 2025
ReleasedJan 20, 2025
Specifications
Training Tokens14.8T
License & Family
License
MIT License
Base ModelDeepSeek-V3
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
20 benchmarks
Average Score
74.1%
Best Score
97.3%
High Performers (80%+)
11Performance Metrics
Max Context Window
262.1KAvg Throughput
4.0 tok/sAvg Latency
0msTop Categories
math
97.3%
code
82.1%
general
75.3%
reasoning
1.3%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MATH-500
Rank #2 of 22
#1Kimi K2 Instruct
97.4%
#2DeepSeek-R1
97.3%
#3Llama 3.1 Nemotron Ultra 253B v1
97.0%
#4Llama-3.3 Nemotron Super 49B v1
96.6%
#5Claude 3.7 Sonnet
96.2%
MMLU-Redux
Rank #3 of 13
#1Qwen3-235B-A22B-Instruct-2507
93.1%
#2DeepSeek-R1-0528
93.4%
#3DeepSeek-R1
92.9%
#4Kimi K2 Instruct
92.7%
#5DeepSeek-V3
89.1%
#6Qwen3 235B A22B
87.4%
CLUEWSC
Rank #1 of 3
#1DeepSeek-R1
92.8%
#2Kimi-k1.5
91.4%
#3DeepSeek-V3
90.9%
Arena Hard
Rank #3 of 22
#1Qwen3 32B
93.8%
#2Qwen3 235B A22B
95.6%
#3DeepSeek-R1
92.3%
#4Qwen3 30B A3B
91.0%
#5Llama-3.3 Nemotron Super 49B v1
88.3%
#6Mistral Small 3 24B Instruct
87.6%
DROP
Rank #1 of 28
#1DeepSeek-R1
92.2%
#2DeepSeek-V3
91.6%
#3Claude 3.5 Sonnet
87.1%
#4Claude 3.5 Sonnet
87.1%
All Benchmark Results for DeepSeek-R1
Complete list of benchmark scores with detailed information
MATH-500 MATH-500 benchmark | math | text | 0.97 | 97.3% | Self-reported |
MMLU-Redux MMLU-Redux benchmark | general | text | 0.93 | 92.9% | Self-reported |
CLUEWSC CLUEWSC benchmark | general | text | 0.93 | 92.8% | Self-reported |
Arena Hard Arena Hard benchmark | general | text | 0.92 | 92.3% | Self-reported |
DROP DROP benchmark | general | text | 0.92 | 92.2% | Self-reported |
C-Eval C-Eval benchmark | code | text | 0.92 | 91.8% | Self-reported |
MMLU MMLU benchmark | general | text | 0.91 | 90.8% | Self-reported |
AlpacaEval 2.0 AlpacaEval 2.0 benchmark | code | text | 0.88 | 87.6% | Self-reported |
MMLU-Pro MMLU-Pro benchmark | general | text | 0.84 | 84.0% | Self-reported |
IFEval IFEval benchmark | code | text | 0.83 | 83.3% | Self-reported |
Showing 1 to 10 of 20 benchmarks