DeepSeek-V2.5

Name: DeepSeek-V2.5
Price: 2 USD
Rating: 71.1 (15 reviews)
Author: DeepSeek

Zero-eval

#1DS-FIM-Eval

#1Aider

#1DS-Arena-Code

+4 more

by DeepSeek

About

DeepSeek-V2.5 is a language model developed by DeepSeek. It achieves strong performance with an average score of 71.1% across 15 benchmarks. It excels particularly in GSM8k (95.1%), MT-Bench (90.2%), HumanEval (89.0%). The model is available through 3 API providers. Released in 2024, it represents DeepSeek's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.14 -$2.00

Output (per 1M)$0.28 -$2.00

Providers3

Timeline

AnnouncedMay 8, 2024

ReleasedMay 8, 2024

Specifications

License & Family

License

deepseek

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

15 benchmarks

Average Score

71.1%

Best Score

95.1%

High Performers (80%+)

Performance Metrics

Max Context Window

16.4K

Avg Throughput

87.7 tok/s

Avg Latency

1ms

Top Categories

roleplay

90.2%

math

84.9%

general

68.4%

code

66.1%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #10 of 46

#7Qwen2.5 72B Instruct

95.8%

#8Qwen2.5 32B Instruct

95.9%

#9Gemma 3 27B

95.9%

#10DeepSeek-V2.5

95.1%

#11Claude 3 Opus

95.0%

#12Nova Pro

94.8%

#13Qwen2.5 14B Instruct

94.8%

MT-Bench

Rank #3 of 11

#1Llama-3.3 Nemotron Super 49B v1

91.7%

#2Qwen2.5 72B Instruct

93.5%

#3DeepSeek-V2.5

90.2%

#4Qwen2.5 7B Instruct

87.5%

#5Mistral Large 2

86.3%

#6Qwen2 7B Instruct

84.1%

HumanEval

Rank #15 of 62

#12Nova Pro

89.0%

#13Llama 3.1 405B Instruct

89.0%

#14Gemini Diffusion

89.6%

#15DeepSeek-V2.5

89.0%

#16Mistral Small 3.1 24B Instruct

88.4%

#17Llama 3.3 70B Instruct

88.4%

#18Grok-2

88.4%

BBH

Rank #4 of 8

#1Qwen2.5 32B Instruct

84.5%

#2Nova Pro

86.9%

#3Qwen3 235B A22B

88.9%

#4DeepSeek-V2.5

84.3%

#5Nova Lite

82.4%

#6Qwen2 72B Instruct

82.4%

#7Nova Micro

79.5%

MMLU

Rank #43 of 78

#40Nova Lite

80.5%

#41Mistral Small 3.2 24B Instruct

80.5%

#42Mistral Small 3.1 24B Instruct

80.6%

#43DeepSeek-V2.5

80.4%

#44Llama 3.1 Nemotron 70B Instruct

80.2%

#45GPT-4.1 nano

80.1%

#46Qwen2.5 14B Instruct

79.7%

All Benchmark Results for DeepSeek-V2.5

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.95	95.1%	Self-reported
MT-Bench MT-Bench benchmark	roleplay	text	90.20	90.2%	Self-reported
HumanEval HumanEval benchmark	code	text	0.89	89.0%	Self-reported
BBH BBH benchmark	general	text	0.84	84.3%	Self-reported
MMLU MMLU benchmark	general	text	0.80	80.4%	Self-reported
AlignBench AlignBench benchmark	general	text	0.80	80.4%	Self-reported
DS-FIM-Eval DS-FIM-Eval benchmark	code	text	0.78	78.3%	Self-reported
Arena Hard Arena Hard benchmark	general	text	0.76	76.2%	Self-reported
MATH MATH benchmark	math	text	0.75	74.7%	Self-reported
HumanEval-Mul HumanEval-Mul benchmark	code	text	0.74	73.8%	Self-reported

Showing 1 to 10 of 15 benchmarks

Resources

API Reference Playground Research Paper Repository Model Weights