DeepSeek

DeepSeek-V3

Zero-eval
#1HumanEval-Mul
#1Aider-Polyglot Edit
#1LongBench v2
+3 more

by DeepSeek

About

DeepSeek-V3 is a language model developed by DeepSeek. It achieves strong performance with an average score of 67.2% across 20 benchmarks. It excels particularly in DROP (91.6%), CLUEWSC (90.9%), MATH-500 (90.2%). It supports a 262K token context window for handling large documents. The model is available through 1 API provider. Released in 2024, it represents DeepSeek's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.27 -$0.27
Output (per 1M)$1.10 -$1.10
Providers1
Timeline
AnnouncedDec 25, 2024
ReleasedDec 25, 2024
Specifications
Training Tokens14.8T
License & Family
License
MIT + Model License (Commercial use allowed)
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

20 benchmarks
Average Score
67.2%
Best Score
91.6%
High Performers (80%+)
8

Performance Metrics

Max Context Window
262.1K
Avg Throughput
100.0 tok/s
Avg Latency
1ms

Top Categories

math
90.2%
code
73.2%
general
65.1%
long_context
48.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

DROP

Rank #2 of 28
#1DeepSeek-R1
92.2%
#2DeepSeek-V3
91.6%
#3Claude 3.5 Sonnet
87.1%
#4Claude 3.5 Sonnet
87.1%
#5GPT-4 Turbo
86.0%

CLUEWSC

Rank #3 of 3
#1Kimi-k1.5
91.4%
#2DeepSeek-R1
92.8%
#3DeepSeek-V3
90.9%

MATH-500

Rank #17 of 22
#14QwQ-32B-Preview
90.6%
#15QwQ-32B
90.6%
#16DeepSeek R1 Distill Qwen 7B
92.8%
#17DeepSeek-V3
90.2%
#18o1-mini
90.0%
#19DeepSeek R1 Distill Llama 8B
89.1%
#20DeepSeek R1 Distill Qwen 1.5B
83.9%

MMLU-Redux

Rank #5 of 13
#2Kimi K2 Instruct
92.7%
#3DeepSeek-R1
92.9%
#4Qwen3-235B-A22B-Instruct-2507
93.1%
#5DeepSeek-V3
89.1%
#6Qwen3 235B A22B
87.4%
#7Qwen2.5 72B Instruct
86.8%
#8Qwen2.5 32B Instruct
83.9%

MMLU

Rank #11 of 78
#8GPT-4o
88.7%
#9Kimi K2 Instruct
89.5%
#10GPT-4.1
90.2%
#11DeepSeek-V3
88.5%
#12Qwen3 235B A22B
87.8%
#13Kimi K2 Base
87.8%
#14GPT-4.1 mini
87.5%
All Benchmark Results for DeepSeek-V3
Complete list of benchmark scores with detailed information
DROP
DROP benchmark
general
text
0.92
91.6%
Self-reported
CLUEWSC
CLUEWSC benchmark
general
text
0.91
90.9%
Self-reported
MATH-500
MATH-500 benchmark
math
text
0.90
90.2%
Self-reported
MMLU-Redux
MMLU-Redux benchmark
general
text
0.89
89.1%
Self-reported
MMLU
MMLU benchmark
general
text
0.89
88.5%
Self-reported
C-Eval
C-Eval benchmark
code
text
0.86
86.5%
Self-reported
IFEval
IFEval benchmark
code
text
0.86
86.1%
Self-reported
HumanEval-Mul
HumanEval-Mul benchmark
code
text
0.83
82.6%
Self-reported
Aider-Polyglot Edit
Aider-Polyglot Edit benchmark
general
text
0.80
79.7%
Self-reported
MMLU-Pro
MMLU-Pro benchmark
general
text
0.76
75.9%
Self-reported
Showing 1 to 10 of 20 benchmarks