xAI

Grok-4

Multimodal
Zero-eval
#1ARC-AGI v2
#2HMMT25
#2GPQA
+2 more

by xAI

About

Grok-4 is a multimodal language model developed by xAI. It achieves strong performance with an average score of 63.1% across 7 benchmarks. It excels particularly in AIME 2025 (91.7%), HMMT25 (90.0%), GPQA (87.5%). It supports a 264K token context window for handling large documents. The model is available through 2 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents xAI's latest advancement in AI technology.

Pricing Range
Input (per 1M)$3.00 -$3.00
Output (per 1M)$15.00 -$15.00
Providers2
Timeline
AnnouncedJul 9, 2025
ReleasedJul 9, 2025
Knowledge CutoffDec 31, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

7 benchmarks
Average Score
63.1%
Best Score
91.7%
High Performers (80%+)
3

Performance Metrics

Max Context Window
264.0K
Avg Throughput
100.0 tok/s
Avg Latency
1ms

Top Categories

code
79.0%
general
69.3%
reasoning
15.9%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

AIME 2025

Rank #5 of 36
#2o4-mini
92.7%
#3Grok-3
93.3%
#4GPT-5
94.6%
#5Grok-4
91.7%
#6GPT-5 mini
91.1%
#7Grok-3 Mini
90.8%
#8Gemini 2.5 Pro Preview 06-05
88.0%

HMMT25

Rank #2 of 3
#1Grok-4 Heavy
96.7%
#2Grok-4
90.0%
#3Qwen3-235B-A22B-Instruct-2507
55.4%

GPQA

Rank #2 of 115
#1Grok-4 Heavy
88.4%
#2Grok-4
87.5%
#3Gemini 2.5 Pro Preview 06-05
86.4%
#4GPT-5
85.7%
#5Claude 3.7 Sonnet
84.8%

LiveCodeBench

Rank #4 of 44
#1Grok-3
79.4%
#2Grok-4 Heavy
79.4%
#3Grok-3 Mini
80.4%
#4Grok-4
79.0%
#5DeepSeek-R1-0528
73.3%
#6Qwen3 235B A22B
70.7%
#7Gemini 2.5 Pro Preview 06-05
69.0%

Humanity's Last Exam

Rank #2 of 16
#1Grok-4 Heavy
50.7%
#2Grok-4
40.0%
#3GPT-5
24.8%
#4Gemini 2.5 Pro Preview 06-05
21.6%
#5o3
20.2%
All Benchmark Results for Grok-4
Complete list of benchmark scores with detailed information
AIME 2025
AIME 2025 benchmark
general
text
0.92
91.7%
Self-reported
HMMT25
HMMT25 benchmark
general
text
0.90
90.0%
Self-reported
GPQA
GPQA benchmark
general
text
0.88
87.5%
Self-reported
LiveCodeBench
LiveCodeBench benchmark
code
text
0.79
79.0%
Self-reported
Humanity's Last Exam
Humanity's Last Exam benchmark
general
text
0.40
40.0%
Self-reported
USAMO25
USAMO25 benchmark
general
text
0.38
37.5%
Self-reported
ARC-AGI v2
ARC-AGI v2 benchmark
reasoning
text
0.16
15.9%
Self-reported
Resources