
DeepSeek R1 Distill Llama 8B
Zero-eval
by DeepSeek
About
DeepSeek R1 Distill Llama 8B is a language model developed by DeepSeek. It achieves strong performance with an average score of 64.4% across 4 benchmarks. It excels particularly in MATH-500 (89.1%), AIME 2024 (80.0%), GPQA (49.0%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents DeepSeek's latest advancement in AI technology.
Timeline
AnnouncedJan 20, 2025
ReleasedJan 20, 2025
Specifications
Training Tokens14.8T
License & Family
License
MIT
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
4 benchmarks
Average Score
64.4%
Best Score
89.1%
High Performers (80%+)
2Top Categories
math
89.1%
general
64.5%
code
39.6%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MATH-500
Rank #19 of 22
#16o1-mini
90.0%
#17DeepSeek-V3
90.2%
#18QwQ-32B-Preview
90.6%
#19DeepSeek R1 Distill Llama 8B
89.1%
#20DeepSeek R1 Distill Qwen 1.5B
83.9%
#21Granite 3.3 8B Base
69.0%
#22Granite 3.3 8B Instruct
69.0%
AIME 2024
Rank #21 of 41
#18DeepSeek R1 Distill Qwen 14B
80.0%
#19Qwen3 30B A3B
80.4%
#20Granite 3.3 8B Base
81.2%
#21DeepSeek R1 Distill Llama 8B
80.0%
#22Claude 3.7 Sonnet
80.0%
#23DeepSeek-R1
79.8%
#24QwQ-32B
79.5%
GPQA
Rank #68 of 115
#65Qwen2.5 72B Instruct
49.0%
#66DeepSeek R1 Distill Qwen 7B
49.1%
#67Qwen2.5 32B Instruct
49.5%
#68DeepSeek R1 Distill Llama 8B
49.0%
#69Kimi K2 Base
48.1%
#70GPT-4 Turbo
48.0%
#71Qwen3 235B A22B
47.5%
LiveCodeBench
Rank #25 of 44
#22Llama 4 Maverick
43.4%
#23DeepSeek-V3 0324
49.2%
#24QwQ-32B-Preview
50.0%
#25DeepSeek R1 Distill Llama 8B
39.6%
#26DeepSeek-V3
37.6%
#27DeepSeek R1 Distill Qwen 7B
37.6%
#28Gemini 2.0 Flash
35.1%
All Benchmark Results for DeepSeek R1 Distill Llama 8B
Complete list of benchmark scores with detailed information
MATH-500 MATH-500 benchmark | math | text | 0.89 | 89.1% | Self-reported |
AIME 2024 AIME 2024 benchmark | general | text | 0.80 | 80.0% | Self-reported |
GPQA GPQA benchmark | general | text | 0.49 | 49.0% | Self-reported |
LiveCodeBench LiveCodeBench benchmark | code | text | 0.40 | 39.6% | Self-reported |