DeepSeek-V3
Zero-eval
#1HumanEval-Mul
#1Aider-Polyglot Edit
#1LongBench v2
+3 more
by DeepSeek
About
DeepSeek-V3 is a language model developed by DeepSeek. It achieves strong performance with an average score of 67.2% across 20 benchmarks. It excels particularly in DROP (91.6%), CLUEWSC (90.9%), MATH-500 (90.2%). It supports a 262K token context window for handling large documents. The model is available through 1 API provider. Released in 2024, it represents DeepSeek's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.27 -$0.27
Output (per 1M)$1.10 -$1.10
Providers1
Timeline
AnnouncedDec 25, 2024
ReleasedDec 25, 2024
Specifications
Training Tokens14.8T
License & Family
License
MIT + Model License (Commercial use allowed)
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
20 benchmarks
Average Score
67.2%
Best Score
91.6%
High Performers (80%+)
8Performance Metrics
Max Context Window
262.1KAvg Throughput
100.0 tok/sAvg Latency
1msTop Categories
math
90.2%
code
73.2%
general
65.1%
long_context
48.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
DROP
Rank #2 of 28
#1DeepSeek-R1
92.2%
#2DeepSeek-V3
91.6%
#3Claude 3.5 Sonnet
87.1%
#4Claude 3.5 Sonnet
87.1%
#5GPT-4 Turbo
86.0%
CLUEWSC
Rank #3 of 3
#1Kimi-k1.5
91.4%
#2DeepSeek-R1
92.8%
#3DeepSeek-V3
90.9%
MATH-500
Rank #17 of 22
#14QwQ-32B-Preview
90.6%
#15QwQ-32B
90.6%
#16DeepSeek R1 Distill Qwen 7B
92.8%
#17DeepSeek-V3
90.2%
#18o1-mini
90.0%
#19DeepSeek R1 Distill Llama 8B
89.1%
#20DeepSeek R1 Distill Qwen 1.5B
83.9%
MMLU-Redux
Rank #5 of 13
#2Kimi K2 Instruct
92.7%
#3DeepSeek-R1
92.9%
#4Qwen3-235B-A22B-Instruct-2507
93.1%
#5DeepSeek-V3
89.1%
#6Qwen3 235B A22B
87.4%
#7Qwen2.5 72B Instruct
86.8%
#8Qwen2.5 32B Instruct
83.9%
MMLU
Rank #11 of 78
#8GPT-4o
88.7%
#9Kimi K2 Instruct
89.5%
#10GPT-4.1
90.2%
#11DeepSeek-V3
88.5%
#12Qwen3 235B A22B
87.8%
#13Kimi K2 Base
87.8%
#14GPT-4.1 mini
87.5%
All Benchmark Results for DeepSeek-V3
Complete list of benchmark scores with detailed information
DROP DROP benchmark | general | text | 0.92 | 91.6% | Self-reported |
CLUEWSC CLUEWSC benchmark | general | text | 0.91 | 90.9% | Self-reported |
MATH-500 MATH-500 benchmark | math | text | 0.90 | 90.2% | Self-reported |
MMLU-Redux MMLU-Redux benchmark | general | text | 0.89 | 89.1% | Self-reported |
MMLU MMLU benchmark | general | text | 0.89 | 88.5% | Self-reported |
C-Eval C-Eval benchmark | code | text | 0.86 | 86.5% | Self-reported |
IFEval IFEval benchmark | code | text | 0.86 | 86.1% | Self-reported |
HumanEval-Mul HumanEval-Mul benchmark | code | text | 0.83 | 82.6% | Self-reported |
Aider-Polyglot Edit Aider-Polyglot Edit benchmark | general | text | 0.80 | 79.7% | Self-reported |
MMLU-Pro MMLU-Pro benchmark | general | text | 0.76 | 75.9% | Self-reported |
Showing 1 to 10 of 20 benchmarks