GPT-5GPT-5 is our flagship model for coding, reasoning, and agentic tasks across domains. The best model for coding and agentic tasks with higher reasoning capabilities and medium speed. | | Aug 7, 2025 | | 74.9% | 88.0% | 93.4% | - | - | |
Claude 3.7 SonnetThe most intelligent Claude model and the first hybrid reasoning model on the market. Claude 3.7 Sonnet can produce near-instant responses or extended, step-by-step thinking that is made visible to the user. Shows particularly strong improvements in coding and front-end web development. | | Feb 24, 2025 | | 70.3% | - | - | - | - | |
Gemini 2.5 FlashA thinking model designed for a balance between price and performance. It builds upon Gemini 2.0 Flash with upgraded reasoning, hybrid thinking control, multimodal capabilities (text, image, video, audio input), and a 1M token input context window. | | May 20, 2025 | | 60.4% | 61.9% | - | - | - | |
GPT-4oGPT-4o ('o' for 'omni') is a multimodal AI model that accepts text, audio, image, and video inputs, and generates text, audio, and image outputs. It matches GPT-4 Turbo performance on text and code, with improvements in non-English languages, vision, and audio understanding. | | Aug 6, 2024 | | 33.2% | 30.7% | - | - | - | |
GPT-4.1 miniGPT-4.1 mini provides a balance between intelligence, speed, and cost. It's a significant leap in small model performance, even beating GPT-4o in many benchmarks while reducing latency and cost. | | Apr 14, 2025 | | 23.6% | 34.7% | - | - | - | |
GPT-5 nanoGPT-5 nano is our fastest, cheapest version of GPT-5. It's great for summarization and classification tasks with average reasoning capabilities and very fast speed. | | Aug 7, 2025 | | - | - | - | - | - | |
GPT OSS 20BThe gpt-oss-20b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). | | Aug 5, 2025 | | - | - | - | - | - | |
GPT OSS 120BThe gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). | | Aug 5, 2025 | | - | - | - | - | - | |
GPT-5 miniA faster, more cost-efficient version of GPT-5 for well-defined tasks. Great for well-defined tasks and precise prompts with high reasoning capabilities at reduced cost. | | Aug 7, 2025 | | - | - | - | - | - | |
Grok-4Grok 4, announced by xAI in summer 2025, represents a major leap in AI capabilities, described as 'the smartest AI in the world.' Built on version 6 of xAI's foundation model, it uses 100x more training compute than Grok 2 and 10x more reinforcement learning compute than Grok 3. The model achieves PhD-level performance across all academic disciplines simultaneously, scoring perfect on standardized tests like the SAT and near-perfect on graduate exams like the GRE. Unlike Grok 3, tool usage is built into the training process rather than relying on generalization. Trained using 200,000 GPUs, Grok 4 excels at complex reasoning, mathematical problem-solving, and coding tasks, though it has acknowledged weaknesses in multimodal capabilities that are being addressed in the next version. | | Jul 9, 2025 | | - | - | - | 79.0% | - | |