DeepSeek

Major Contributor

deepseek.com

About

Chinese AI company

Portfolio Stats

Total Models15

Multimodal3

Benchmarks Run146

Avg Performance68.6%

Latest Release

DeepSeek-R1-0528

Released: May 28, 2025

Release Timeline

Recent model releases by year

2025

10 models

2024

5 models

Performance Overview

Top models and benchmark performance

Top Performing Models

By avg score

#1DeepSeek R1 Zero

76.5%

#2DeepSeek R1 Distill Llama 70B

76.0%

#3DeepSeek R1 Distill Qwen 32B

74.2%

#4DeepSeek-R1

74.1%

#5DeepSeek R1 Distill Qwen 14B

71.5%

Benchmark Categories

math

84.9%

vision

73.6%

general

68.6%

roleplay

67.5%

code

63.1%

Model Statistics

Multimodal Ratio

20%

Models with Providers

All Models

Complete portfolio of 15 models with advanced filtering

		License
DeepSeek-R1-0528 An upgraded version of DeepSeek R1 with significantly improved reasoning capabilities. This model leverages increased computational resources and algorithmic optimization mechanisms during post-training, demonstrating outstanding performance across mathematics, programming, and general logic tasks.	May 28, 2025	MIT License	57.6%	71.6%	-	73.3%	-
DeepSeek-R1 DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	Jan 20, 2025	MIT License	49.2%	53.3%	-	65.9%	-
DeepSeek-V3 A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.	Dec 25, 2024	MIT + Model License (Commercial use allowed)	42.0%	49.6%	-	37.6%	-
DeepSeek-V2.5 DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.	May 8, 2024	deepseek	16.8%	-	89.0%	-	-
DeepSeek-V3 0324 A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.	Mar 25, 2025	MIT + Model License (Commercial use allowed)	-	-	-	49.2%	-
DeepSeek R1 Distill Qwen 7B DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	Jan 20, 2025	MIT	-	-	-	37.6%	-
DeepSeek R1 Distill Qwen 1.5B DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	Jan 20, 2025	MIT	-	-	-	16.9%	-
DeepSeek R1 Distill Qwen 14B DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	Jan 20, 2025	MIT	-	-	-	53.1%	-
DeepSeek R1 Distill Qwen 32B DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	Jan 20, 2025	MIT	-	-	-	57.2%	-
DeepSeek R1 Zero DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.	Jan 20, 2025	MIT	-	-	-	50.0%	-

Showing 1 to 10 of 15 models

Resources

Official Website