Model Comparison
Compare the top 5 language models by average benchmark performance
Note: This comparison shows the top 5 models ranked by average benchmark score. You can select specific models to compare in a future update.
Feature | Grok-3 Mini | Mistral Large 2 | Grok-3 | Claude 3.5 Sonnet | o1-pro |
---|---|---|---|---|---|
Organization | xAI | Mistral AI | xAI | Anthropic | OpenAI |
Release Date | 2025-02-17 | 2024-07-24 | 2025-02-17 | 2024-06-21 | 2024-12-17 |
License | Proprietary | Mistral Research License | Proprietary | Proprietary | Proprietary |
Multimodal | |||||
Average Score | 87.8% | 87.6% | 85.7% | 84.1% | 82.5% |
AIME 2024 | 95.8% | 93.3% | 86.0% | ||
AIME 2025 | 90.8% | 93.3% | |||
GPQA | 84.0% | 84.6% | 59.4% | 79.0% | |
LiveCodeBench | 80.4% | 79.4% | |||
GSM8k | 93.0% | 96.4% | |||
HumanEval | 92.0% | 92.0% | |||
MMLU | 84.0% | 90.4% | |||
MMLU French | 82.8% | ||||
MT-Bench | 86.3% | ||||
MMMU | 78.0% | ||||
BIG-Bench Hard | 93.1% | ||||
DROP | 87.1% | ||||
MATH | 71.1% | ||||
MGSM | 91.6% | ||||
MMLU-Pro | 76.1% | ||||
xAI | |||||
Mistral AI | |||||
Bedrock | |||||
Min Input Price(cents per 1M tokens) | $0.30 | $2.00 | $3.00 | $3.00 | - |
Min Output Price(cents per 1M tokens) | $0.50 | $6.00 | $15.00 | $15.00 | - |
Understanding the Comparison
- • Average Score: The mean normalized score across all benchmarks
- • License: Green indicates commercial use allowed, orange indicates restrictions
- • Pricing: Minimum price across all providers (when available)