Grok-4

Multimodal

Zero-eval

#1ARC-AGI v2

#2HMMT25

#2GPQA

+2 more

by xAI

About

Grok-4 is a multimodal language model developed by xAI. It achieves strong performance with an average score of 63.1% across 7 benchmarks. It excels particularly in AIME 2025 (91.7%), HMMT25 (90.0%), GPQA (87.5%). It supports a 264K token context window for handling large documents. The model is available through 2 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents xAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$3.00 -$3.00

Output (per 1M)$15.00 -$15.00

Providers2

Timeline

AnnouncedJul 9, 2025

ReleasedJul 9, 2025

Knowledge CutoffDec 31, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

7 benchmarks

Average Score

63.1%

Best Score

91.7%

High Performers (80%+)

Performance Metrics

Max Context Window

264.0K

Avg Throughput

100.0 tok/s

Avg Latency

1ms

Top Categories

code

79.0%

general

69.3%

reasoning

15.9%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

AIME 2025

Rank #5 of 36

#2o4-mini

92.7%

#3Grok-3

93.3%

#4GPT-5

94.6%

#5Grok-4

91.7%

#6GPT-5 mini

91.1%

#7Grok-3 Mini

90.8%

#8Gemini 2.5 Pro Preview 06-05

88.0%

HMMT25

Rank #2 of 3

#1Grok-4 Heavy

96.7%

#2Grok-4

90.0%

#3Qwen3-235B-A22B-Instruct-2507

55.4%

GPQA

Rank #2 of 115

#1Grok-4 Heavy

88.4%

#2Grok-4

87.5%

#3Gemini 2.5 Pro Preview 06-05

86.4%

#4GPT-5

85.7%

#5Claude 3.7 Sonnet

84.8%

LiveCodeBench

Rank #4 of 44

#1Grok-3

79.4%

#2Grok-4 Heavy

79.4%

#3Grok-3 Mini

80.4%

#4Grok-4

79.0%

#5DeepSeek-R1-0528

73.3%

#6Qwen3 235B A22B

70.7%

#7Gemini 2.5 Pro Preview 06-05

69.0%

Humanity's Last Exam

Rank #2 of 16

#1Grok-4 Heavy

50.7%

#2Grok-4

40.0%

#3GPT-5

24.8%

#4Gemini 2.5 Pro Preview 06-05

21.6%

#5o3

20.2%

All Benchmark Results for Grok-4

Complete list of benchmark scores with detailed information


AIME 2025 AIME 2025 benchmark	general	text	0.92	91.7%	Self-reported
HMMT25 HMMT25 benchmark	general	text	0.90	90.0%	Self-reported
GPQA GPQA benchmark	general	text	0.88	87.5%	Self-reported
LiveCodeBench LiveCodeBench benchmark	code	text	0.79	79.0%	Self-reported
Humanity's Last Exam Humanity's Last Exam benchmark	general	text	0.40	40.0%	Self-reported
USAMO25 USAMO25 benchmark	general	text	0.38	37.5%	Self-reported
ARC-AGI v2 ARC-AGI v2 benchmark	reasoning	text	0.16	15.9%	Self-reported

Resources

API Reference