DeepSeek-R1DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks. | | Jan 20, 2025 | | 49.2% | 53.3% | - | 65.9% | - | |
Llama 4 ScoutLlama 4 Scout is a natively multimodal model capable of processing both text and images. It features a 17 billion activated parameter (109B total) mixture-of-experts (MoE) architecture with 16 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 10 million token context window. | | Apr 5, 2025 | Llama 4 Community License Agreement | - | - | - | 32.8% | 67.8% | |
Llama 3.2 11B InstructLlama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. It accepts text and images as input and generates text as output. | | Sep 25, 2024 | Llama 3.2 Community License | - | - | - | - | - | |
Qwen3 235B A22BQwen3 235B A22B is a large language model developed by Alibaba, featuring a Mixture-of-Experts (MoE) architecture with 235 billion total parameters and 22 billion activated parameters. It achieves competitive results in benchmark evaluations of coding, math, general capabilities, and more, compared to other top-tier models. | | Apr 29, 2025 | | - | - | - | 70.7% | 81.4% | |
Llama 3.1 8B InstructLlama 3.1 8B Instruct is a multilingual large language model optimized for dialogue use cases. It features a 128K context length, state-of-the-art tool use, and strong reasoning capabilities. | | Jul 23, 2024 | Llama 3.1 Community License | - | - | 72.6% | - | - | |
Llama 4 MaverickLlama 4 Maverick is a natively multimodal model capable of processing both text and images. It features a 17 billion active parameter mixture-of-experts (MoE) architecture with 128 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 1 million token context window. | | Apr 5, 2025 | Llama 4 Community License Agreement | - | - | - | 43.4% | 77.6% | |
Qwen2.5 7B InstructQwen2.5-7B-Instruct is an instruction-tuned 7B parameter language model that excels at following instructions, generating long texts (over 8K tokens), understanding structured data, and generating structured outputs like JSON. The model features enhanced capabilities in mathematics, coding, and multilingual support across 29+ languages including Chinese, English, French, Spanish, and more. | | Sep 19, 2024 | | - | - | 84.8% | 28.7% | 79.2% | |
Llama 3.3 70B InstructLlama 3.3 is a multilingual large language model optimized for dialogue use cases across multiple languages. It is a pretrained and instruction-tuned generative model with 70 billion parameters, outperforming many open-source and closed chat models on common industry benchmarks. Llama 3.3 supports a context length of 128,000 tokens and is designed for commercial and research use in multiple languages. | | Dec 6, 2024 | Llama 3.3 Community License Agreement | - | - | 88.4% | - | - | |
Llama 3.1 70B InstructLlama 3.1 70B Instruct is a large language model optimized for multilingual dialogue use cases. It outperforms many available open source and closed chat models on common industry benchmarks. | | Jul 23, 2024 | Llama 3.1 Community License | - | - | 80.5% | - | - | |
QwQ-32B-PreviewAn experimental research model focused on advancing AI reasoning capabilities, particularly excelling in mathematics and programming. Features deep introspection and self-questioning abilities while having some limitations in language mixing and recursive reasoning patterns. | | Nov 28, 2024 | | - | - | - | 50.0% | - | |