Model Comparison

Compare the top 5 language models by average benchmark performance

Note: This comparison shows the top 5 models ranked by average benchmark score. You can select specific models to compare in a future update.

FeatureGrok-3 MiniMistral Large 2Grok-3Claude 3.5 Sonneto1-pro
Organization
xAIMistral AIxAIAnthropicOpenAI
Release Date
2025-02-172024-07-242025-02-172024-06-212024-12-17
License
Proprietary
Mistral Research License
Proprietary
Proprietary
Proprietary
Multimodal
Average Score
87.8%87.6%85.7%84.1%82.5%
AIME 202495.8%93.3%86.0%
AIME 202590.8%93.3%
GPQA84.0%84.6%59.4%79.0%
LiveCodeBench80.4%79.4%
GSM8k93.0%96.4%
HumanEval92.0%92.0%
MMLU84.0%90.4%
MMLU French82.8%
MT-Bench86.3%
MMMU78.0%
BIG-Bench Hard93.1%
DROP87.1%
MATH71.1%
MGSM91.6%
MMLU-Pro76.1%
xAI
Google
Mistral AI
Bedrock
Min Input Price(cents per 1M tokens)$0.30$2.00$3.00$3.00-
Min Output Price(cents per 1M tokens)$0.50$6.00$15.00$15.00-

Understanding the Comparison

  • Average Score: The mean normalized score across all benchmarks
  • License: Green indicates commercial use allowed, orange indicates restrictions
  • Pricing: Minimum price across all providers (when available)