DeepSeek-R1

Name: DeepSeek-R1
Price: 7 USD
Rating: 74.1 (20 reviews)
Author: DeepSeek

Zero-eval

#1CLUEWSC

#1DROP

#1AlpacaEval 2.0

+7 more

by DeepSeek

About

DeepSeek-R1 is a language model developed by DeepSeek. It achieves strong performance with an average score of 74.1% across 20 benchmarks. It excels particularly in MATH-500 (97.3%), MMLU-Redux (92.9%), CLUEWSC (92.8%). It supports a 262K token context window for handling large documents. The model is available through 4 API providers. Released in 2025, it represents DeepSeek's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.55 -$8.00

Output (per 1M)$2.19 -$8.00

Providers4

Timeline

AnnouncedJan 20, 2025

ReleasedJan 20, 2025

Specifications

Training Tokens14.8T

License & Family

License

MIT License

Base ModelDeepSeek-V3

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

20 benchmarks

Average Score

74.1%

Best Score

97.3%

High Performers (80%+)

Performance Metrics

Max Context Window

262.1K

Avg Throughput

4.0 tok/s

Avg Latency

0ms

Top Categories

math

97.3%

code

82.1%

general

75.3%

reasoning

1.3%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MATH-500

Rank #2 of 22

#1Kimi K2 Instruct

97.4%

#2DeepSeek-R1

97.3%

#3Llama 3.1 Nemotron Ultra 253B v1

97.0%

#4Llama-3.3 Nemotron Super 49B v1

96.6%

#5Claude 3.7 Sonnet

96.2%

MMLU-Redux

Rank #3 of 13

#1Qwen3-235B-A22B-Instruct-2507

93.1%

#2DeepSeek-R1-0528

93.4%

#3DeepSeek-R1

92.9%

#4Kimi K2 Instruct

92.7%

#5DeepSeek-V3

89.1%

#6Qwen3 235B A22B

87.4%

CLUEWSC

Rank #1 of 3

#1DeepSeek-R1

92.8%

#2Kimi-k1.5

91.4%

#3DeepSeek-V3

90.9%

Arena Hard

Rank #3 of 22

#1Qwen3 32B

93.8%

#2Qwen3 235B A22B

95.6%

#3DeepSeek-R1

92.3%

#4Qwen3 30B A3B

91.0%

#5Llama-3.3 Nemotron Super 49B v1

88.3%

#6Mistral Small 3 24B Instruct

87.6%

DROP

Rank #1 of 28

#1DeepSeek-R1

92.2%

#2DeepSeek-V3

91.6%

#3Claude 3.5 Sonnet

87.1%

#4Claude 3.5 Sonnet

87.1%

All Benchmark Results for DeepSeek-R1

Complete list of benchmark scores with detailed information


MATH-500 MATH-500 benchmark	math	text	0.97	97.3%	Self-reported
MMLU-Redux MMLU-Redux benchmark	general	text	0.93	92.9%	Self-reported
CLUEWSC CLUEWSC benchmark	general	text	0.93	92.8%	Self-reported
Arena Hard Arena Hard benchmark	general	text	0.92	92.3%	Self-reported
DROP DROP benchmark	general	text	0.92	92.2%	Self-reported
C-Eval C-Eval benchmark	code	text	0.92	91.8%	Self-reported
MMLU MMLU benchmark	general	text	0.91	90.8%	Self-reported
AlpacaEval 2.0 AlpacaEval 2.0 benchmark	code	text	0.88	87.6%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.84	84.0%	Self-reported
IFEval IFEval benchmark	code	text	0.83	83.3%	Self-reported

Showing 1 to 10 of 20 benchmarks

Resources

API Reference Playground Research Paper Repository Model Weights