
DeepSeek R1 Zero
Zero-eval
by DeepSeek
About
DeepSeek R1 Zero is a language model developed by DeepSeek. It achieves strong performance with an average score of 76.5% across 4 benchmarks. It excels particularly in MATH-500 (95.9%), AIME 2024 (86.7%), GPQA (73.3%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents DeepSeek's latest advancement in AI technology.
Timeline
AnnouncedJan 20, 2025
ReleasedJan 20, 2025
Specifications
Training Tokens14.8T
License & Family
License
MIT
Base ModelDeepSeek-V3
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
4 benchmarks
Average Score
76.5%
Best Score
95.9%
High Performers (80%+)
2Top Categories
math
95.9%
general
80.0%
code
50.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MATH-500
Rank #7 of 22
#4Kimi-k1.5
96.2%
#5Claude 3.7 Sonnet
96.2%
#6Llama-3.3 Nemotron Super 49B v1
96.6%
#7DeepSeek R1 Zero
95.9%
#8Llama 3.1 Nemotron Nano 8B V1
95.4%
#9Phi 4 Mini Reasoning
94.6%
#10DeepSeek R1 Distill Llama 70B
94.5%
AIME 2024
Rank #10 of 41
#7DeepSeek R1 Distill Llama 70B
86.7%
#8o3-mini
87.3%
#9Gemini 2.5 Flash
88.0%
#10DeepSeek R1 Zero
86.7%
#11o1-pro
86.0%
#12Qwen3 235B A22B
85.7%
#13DeepSeek R1 Distill Qwen 7B
83.3%
GPQA
Rank #23 of 115
#20Gemini 2.0 Flash Thinking
74.2%
#21Kimi K2 Instruct
75.1%
#22Claude Sonnet 4
75.4%
#23DeepSeek R1 Zero
73.3%
#24o1-preview
73.3%
#25GPT OSS 120B
71.5%
#26DeepSeek-R1
71.5%
LiveCodeBench
Rank #21 of 44
#18Magistral Medium
50.3%
#19Magistral Small 2506
51.3%
#20DeepSeek R1 Distill Qwen 14B
53.1%
#21DeepSeek R1 Zero
50.0%
#22QwQ-32B-Preview
50.0%
#23DeepSeek-V3 0324
49.2%
#24Llama 4 Maverick
43.4%
All Benchmark Results for DeepSeek R1 Zero
Complete list of benchmark scores with detailed information
MATH-500 MATH-500 benchmark | math | text | 0.96 | 95.9% | Self-reported |
AIME 2024 AIME 2024 benchmark | general | text | 0.87 | 86.7% | Self-reported |
GPQA GPQA benchmark | general | text | 0.73 | 73.3% | Self-reported |
LiveCodeBench LiveCodeBench benchmark | code | text | 0.50 | 50.0% | Self-reported |