DeepSeek-R1-0528An upgraded version of DeepSeek R1 with significantly improved reasoning capabilities. This model leverages increased computational resources and algorithmic optimization mechanisms during post-training, demonstrating outstanding performance across mathematics, programming, and general logic tasks. | | May 28, 2025 | | 57.6% | 71.6% | - | 73.3% | - | |
Qwen3 30B A3BQwen3-30B-A3B is a smaller Mixture-of-Experts (MoE) model from the Qwen3 series by Alibaba, with 30.5 billion total parameters and 3.3 billion activated parameters. Features hybrid thinking/non-thinking modes, support for 119 languages, and enhanced agent capabilities. It aims to outperform previous models like QwQ-32B while using significantly fewer activated parameters. | | Apr 29, 2025 | | - | - | - | 62.6% | - | |
Qwen3 32BQwen3-32B is a large language model from Alibaba's Qwen3 series. It features 32.8 billion parameters, a 128k token context window, support for 119 languages, and hybrid thinking modes allowing switching between deep reasoning and fast responses. It demonstrates strong performance in reasoning, instruction-following, and agent capabilities. | | Apr 29, 2025 | | - | - | - | 65.7% | - | |
Llama 4 ScoutLlama 4 Scout is a natively multimodal model capable of processing both text and images. It features a 17 billion activated parameter (109B total) mixture-of-experts (MoE) architecture with 16 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 10 million token context window. | | Apr 5, 2025 | Llama 4 Community License Agreement | - | - | - | 32.8% | 67.8% | |
Gemma 3 27BGemma 3 27B is a 27-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for complex question answering, summarization, reasoning, and image understanding tasks. | | Mar 12, 2025 | | - | - | 87.8% | 29.7% | 74.4% | |
Llama 4 MaverickLlama 4 Maverick is a natively multimodal model capable of processing both text and images. It features a 17 billion active parameter mixture-of-experts (MoE) architecture with 128 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 1 million token context window. | | Apr 5, 2025 | Llama 4 Community License Agreement | - | - | - | 43.4% | 77.6% | |
Qwen3 235B A22BQwen3 235B A22B is a large language model developed by Alibaba, featuring a Mixture-of-Experts (MoE) architecture with 235 billion total parameters and 22 billion activated parameters. It achieves competitive results in benchmark evaluations of coding, math, general capabilities, and more, compared to other top-tier models. | | Apr 29, 2025 | | - | - | - | 70.7% | 81.4% | |
Kimi K2 InstructKimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the MuonClip optimizer, it achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. The instruct variant is post-trained for drop-in, general-purpose chat and agentic experiences without long thinking. | | Jan 1, 2025 | | - | 60.0% | 93.3% | - | - | |