
DeepSeek-V2.5
Zero-eval
#1DS-FIM-Eval
#1Aider
#1DS-Arena-Code
+4 more
by DeepSeek
About
DeepSeek-V2.5 is a language model developed by DeepSeek. It achieves strong performance with an average score of 71.1% across 15 benchmarks. It excels particularly in GSM8k (95.1%), MT-Bench (90.2%), HumanEval (89.0%). The model is available through 3 API providers. Released in 2024, it represents DeepSeek's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.14 -$2.00
Output (per 1M)$0.28 -$2.00
Providers3
Timeline
AnnouncedMay 8, 2024
ReleasedMay 8, 2024
Specifications
License & Family
License
deepseek
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
15 benchmarks
Average Score
71.1%
Best Score
95.1%
High Performers (80%+)
6Performance Metrics
Max Context Window
16.4KAvg Throughput
87.7 tok/sAvg Latency
1msTop Categories
roleplay
90.2%
math
84.9%
general
68.4%
code
66.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
GSM8k
Rank #10 of 46
#7Qwen2.5 72B Instruct
95.8%
#8Qwen2.5 32B Instruct
95.9%
#9Gemma 3 27B
95.9%
#10DeepSeek-V2.5
95.1%
#11Claude 3 Opus
95.0%
#12Nova Pro
94.8%
#13Qwen2.5 14B Instruct
94.8%
MT-Bench
Rank #3 of 11
#1Llama-3.3 Nemotron Super 49B v1
91.7%
#2Qwen2.5 72B Instruct
93.5%
#3DeepSeek-V2.5
90.2%
#4Qwen2.5 7B Instruct
87.5%
#5Mistral Large 2
86.3%
#6Qwen2 7B Instruct
84.1%
HumanEval
Rank #15 of 62
#12Nova Pro
89.0%
#13Llama 3.1 405B Instruct
89.0%
#14Gemini Diffusion
89.6%
#15DeepSeek-V2.5
89.0%
#16Mistral Small 3.1 24B Instruct
88.4%
#17Llama 3.3 70B Instruct
88.4%
#18Grok-2
88.4%
BBH
Rank #4 of 8
#1Qwen2.5 32B Instruct
84.5%
#2Nova Pro
86.9%
#3Qwen3 235B A22B
88.9%
#4DeepSeek-V2.5
84.3%
#5Nova Lite
82.4%
#6Qwen2 72B Instruct
82.4%
#7Nova Micro
79.5%
MMLU
Rank #43 of 78
#40Nova Lite
80.5%
#41Mistral Small 3.2 24B Instruct
80.5%
#42Mistral Small 3.1 24B Instruct
80.6%
#43DeepSeek-V2.5
80.4%
#44Llama 3.1 Nemotron 70B Instruct
80.2%
#45GPT-4.1 nano
80.1%
#46Qwen2.5 14B Instruct
79.7%
All Benchmark Results for DeepSeek-V2.5
Complete list of benchmark scores with detailed information
GSM8k GSM8k benchmark | math | text | 0.95 | 95.1% | Self-reported |
MT-Bench MT-Bench benchmark | roleplay | text | 90.20 | 90.2% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.89 | 89.0% | Self-reported |
BBH BBH benchmark | general | text | 0.84 | 84.3% | Self-reported |
MMLU MMLU benchmark | general | text | 0.80 | 80.4% | Self-reported |
AlignBench AlignBench benchmark | general | text | 0.80 | 80.4% | Self-reported |
DS-FIM-Eval DS-FIM-Eval benchmark | code | text | 0.78 | 78.3% | Self-reported |
Arena Hard Arena Hard benchmark | general | text | 0.76 | 76.2% | Self-reported |
MATH MATH benchmark | math | text | 0.75 | 74.7% | Self-reported |
HumanEval-Mul HumanEval-Mul benchmark | code | text | 0.74 | 73.8% | Self-reported |
Showing 1 to 10 of 15 benchmarks