DeepSeek-R1-0528An upgraded version of DeepSeek R1 with significantly improved reasoning capabilities. This model leverages increased computational resources and algorithmic optimization mechanisms during post-training, demonstrating outstanding performance across mathematics, programming, and general logic tasks. | | May 28, 2025 | | 57.6% | 71.6% | - | 73.3% | - | |
DeepSeek-R1DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks. | | Jan 20, 2025 | | 49.2% | 53.3% | - | 65.9% | - | |
DeepSeek-V2.5DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following. | | May 8, 2024 | | 16.8% | - | 89.0% | - | - | |
Llama 3.2 3B InstructLlama 3.2 3B Instruct is a large language model that supports a context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge. | | Sep 25, 2024 | Llama 3.2 Community License | - | - | - | - | - | |
Gemma 3 4BGemma 3 4B is a 4-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks. | | Mar 12, 2025 | | - | - | 71.3% | 12.6% | 63.2% | |
Gemma 3 12BGemma 3 12B is a 12-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks. | | Mar 12, 2025 | | - | - | 85.4% | 24.6% | 73.0% | |
Phi-4-multimodal-instructPhi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a 128K token context length. Enhanced via SFT, DPO, and RLHF for instruction following and safety. | | Feb 1, 2025 | | - | - | - | - | - | |
Llama 3.2 11B InstructLlama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. It accepts text and images as input and generates text as output. | | Sep 25, 2024 | Llama 3.2 Community License | - | - | - | - | - | |
Llama 3.1 8B InstructLlama 3.1 8B Instruct is a multilingual large language model optimized for dialogue use cases. It features a 128K context length, state-of-the-art tool use, and strong reasoning capabilities. | | Jul 23, 2024 | Llama 3.1 Community License | - | - | 72.6% | - | - | |
Mistral Small 3 24B InstructMistral Small 3 is a 24B-parameter LLM licensed under Apache-2.0. It focuses on low-latency, high-efficiency instruction following, maintaining performance comparable to larger models. It provides quick, accurate responses for conversational agents, function calling, and domain-specific fine-tuning. Suitable for local inference when quantized, it rivals models 2–3× its size while using significantly fewer compute resources. | | Jan 30, 2025 | | - | - | 84.8% | - | - | |