DeepSeek

DeepSeek

Major Contributor
About

Chinese AI company

Portfolio Stats
Total Models15
Multimodal3
Benchmarks Run146
Avg Performance68.6%
Latest Release
DeepSeek-R1-0528
Released: May 28, 2025
Release Timeline
Recent model releases by year
2025
10 models
2024
5 models
Performance Overview
Top models and benchmark performance

Benchmark Categories

math
15
84.9%
vision
12
73.6%
general
88
68.6%
roleplay
4
67.5%
code
23
63.1%

Model Statistics

Multimodal Ratio
20%
Models with Providers
7

All Models

Complete portfolio of 15 models with advanced filtering

LicenseLinks
DeepSeekDeepSeek-R1-0528
An upgraded version of DeepSeek R1 with significantly improved reasoning capabilities. This model leverages increased computational resources and algorithmic optimization mechanisms during post-training, demonstrating outstanding performance across mathematics, programming, and general logic tasks.
May 28, 2025
MIT License
57.6%71.6%-73.3%-
DeepSeekDeepSeek-R1
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT License
49.2%53.3%-65.9%-
DeepSeekDeepSeek-V3
A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.
Dec 25, 2024
MIT + Model License (Commercial use allowed)
42.0%49.6%-37.6%-
DeepSeekDeepSeek-V2.5
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.
May 8, 2024
deepseek
16.8%-89.0%--
DeepSeekDeepSeek-V3 0324
A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.
Mar 25, 2025
MIT + Model License (Commercial use allowed)
---49.2%-
DeepSeekDeepSeek R1 Distill Qwen 7B
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT
---37.6%-
DeepSeekDeepSeek R1 Distill Qwen 1.5B
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT
---16.9%-
DeepSeekDeepSeek R1 Distill Qwen 14B
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT
---53.1%-
DeepSeekDeepSeek R1 Distill Qwen 32B
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT
---57.2%-
DeepSeekDeepSeek R1 Zero
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
Jan 20, 2025
MIT
---50.0%-
Showing 1 to 10 of 15 models