
Grok-4
Multimodal
Zero-eval
#1ARC-AGI v2
#2HMMT25
#2GPQA
+2 more
by xAI
About
Grok-4 is a multimodal language model developed by xAI. It achieves strong performance with an average score of 63.1% across 7 benchmarks. It excels particularly in AIME 2025 (91.7%), HMMT25 (90.0%), GPQA (87.5%). It supports a 264K token context window for handling large documents. The model is available through 2 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents xAI's latest advancement in AI technology.
Pricing Range
Input (per 1M)$3.00 -$3.00
Output (per 1M)$15.00 -$15.00
Providers2
Timeline
AnnouncedJul 9, 2025
ReleasedJul 9, 2025
Knowledge CutoffDec 31, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
7 benchmarks
Average Score
63.1%
Best Score
91.7%
High Performers (80%+)
3Performance Metrics
Max Context Window
264.0KAvg Throughput
100.0 tok/sAvg Latency
1msTop Categories
code
79.0%
general
69.3%
reasoning
15.9%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
AIME 2025
Rank #5 of 36
#2o4-mini
92.7%
#3Grok-3
93.3%
#4GPT-5
94.6%
#5Grok-4
91.7%
#6GPT-5 mini
91.1%
#7Grok-3 Mini
90.8%
#8Gemini 2.5 Pro Preview 06-05
88.0%
HMMT25
Rank #2 of 3
#1Grok-4 Heavy
96.7%
#2Grok-4
90.0%
#3Qwen3-235B-A22B-Instruct-2507
55.4%
GPQA
Rank #2 of 115
#1Grok-4 Heavy
88.4%
#2Grok-4
87.5%
#3Gemini 2.5 Pro Preview 06-05
86.4%
#4GPT-5
85.7%
#5Claude 3.7 Sonnet
84.8%
LiveCodeBench
Rank #4 of 44
#1Grok-3
79.4%
#2Grok-4 Heavy
79.4%
#3Grok-3 Mini
80.4%
#4Grok-4
79.0%
#5DeepSeek-R1-0528
73.3%
#6Qwen3 235B A22B
70.7%
#7Gemini 2.5 Pro Preview 06-05
69.0%
Humanity's Last Exam
Rank #2 of 16
#1Grok-4 Heavy
50.7%
#2Grok-4
40.0%
#3GPT-5
24.8%
#4Gemini 2.5 Pro Preview 06-05
21.6%
#5o3
20.2%
All Benchmark Results for Grok-4
Complete list of benchmark scores with detailed information
AIME 2025 AIME 2025 benchmark | general | text | 0.92 | 91.7% | Self-reported |
HMMT25 HMMT25 benchmark | general | text | 0.90 | 90.0% | Self-reported |
GPQA GPQA benchmark | general | text | 0.88 | 87.5% | Self-reported |
LiveCodeBench LiveCodeBench benchmark | code | text | 0.79 | 79.0% | Self-reported |
Humanity's Last Exam Humanity's Last Exam benchmark | general | text | 0.40 | 40.0% | Self-reported |
USAMO25 USAMO25 benchmark | general | text | 0.38 | 37.5% | Self-reported |
ARC-AGI v2 ARC-AGI v2 benchmark | reasoning | text | 0.16 | 15.9% | Self-reported |
Resources