Model Comparison

Compare the top 5 language models by average benchmark performance

Note: This comparison shows the top 5 models ranked by average benchmark score. You can select specific models to compare in a future update.

Feature	Grok-3 Mini	Mistral Large 2	Grok-3	Claude 3.5 Sonnet	o1-pro
Organization	xAI	Mistral AI	xAI	Anthropic	OpenAI
Release Date	2025-02-17	2024-07-24	2025-02-17	2024-06-21	2024-12-17
License	Proprietary	Mistral Research License	Proprietary	Proprietary	Proprietary
Multimodal
Average Score	87.8%	87.6%	85.7%	84.1%	82.5%
AIME 2024	95.8%		93.3%		86.0%
AIME 2025	90.8%		93.3%
GPQA	84.0%		84.6%	59.4%	79.0%
LiveCodeBench	80.4%		79.4%
GSM8k		93.0%		96.4%
HumanEval		92.0%		92.0%
MMLU		84.0%		90.4%
MMLU French		82.8%
MT-Bench		86.3%
MMMU			78.0%
BIG-Bench Hard				93.1%
DROP				87.1%
MATH				71.1%
MGSM				91.6%
MMLU-Pro				76.1%
xAI
Google
Mistral AI
Bedrock
Min Input Price(cents per 1M tokens)	$0.30	$2.00	$3.00	$3.00	-
Min Output Price(cents per 1M tokens)	$0.50	$6.00	$15.00	$15.00	-

• Average Score: The mean normalized score across all benchmarks
• License: Green indicates commercial use allowed, orange indicates restrictions
• Pricing: Minimum price across all providers (when available)