Grok-3

Multimodal

Zero-eval

#3AIME 2024

#3AIME 2025

#3LiveCodeBench

by xAI

About

Grok-3 is a multimodal language model developed by xAI. This model demonstrates exceptional performance with an average score of 85.7% across 5 benchmarks. It excels particularly in AIME 2024 (93.3%), AIME 2025 (93.3%), GPQA (84.6%). The model shows particular specialization in general tasks with an average performance of 90.4%. It supports a 136K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents xAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$3.00 -$3.00

Output (per 1M)$15.00 -$15.00

Providers1

Timeline

AnnouncedFeb 17, 2025

ReleasedFeb 17, 2025

Knowledge CutoffNov 17, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

5 benchmarks

Average Score

85.7%

Best Score

93.3%

High Performers (80%+)

Performance Metrics

Max Context Window

136.0K

Avg Throughput

100.0 tok/s

Avg Latency

1ms

Top Categories

general

90.4%

code

79.4%

vision

78.0%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

AIME 2024

Rank #3 of 41

#1o4-mini

93.4%

#2Grok-3 Mini

95.8%

#3Grok-3

93.3%

#4Gemini 2.5 Pro

92.0%

#5o3

91.6%

#6DeepSeek-R1-0528

91.4%

AIME 2025

Rank #3 of 36

#1GPT-5

94.6%

#2Grok-4 Heavy

100.0%

#3Grok-3

93.3%

#4o4-mini

92.7%

#5Grok-4

91.7%

#6GPT-5 mini

91.1%

GPQA

Rank #6 of 115

#3Claude 3.7 Sonnet

84.8%

#4GPT-5

85.7%

#5Gemini 2.5 Pro Preview 06-05

86.4%

#6Grok-3

84.6%

#7Grok-3 Mini

84.0%

#8o3

83.3%

#9Gemini 2.5 Pro

83.0%

LiveCodeBench

Rank #3 of 44

#1Grok-4 Heavy

79.4%

#2Grok-3 Mini

80.4%

#3Grok-3

79.4%

#4Grok-4

79.0%

#5DeepSeek-R1-0528

73.3%

#6Qwen3 235B A22B

70.7%

MMMU

Rank #7 of 52

#4Gemini 2.5 Pro

79.6%

#5Gemini 2.5 Flash

79.7%

#6o4-mini

81.6%

#7Grok-3

78.0%

#8o1

77.6%

#9Gemini 2.0 Flash Thinking

75.4%

#10GPT-4.5

75.2%

All Benchmark Results for Grok-3

Complete list of benchmark scores with detailed information


AIME 2024 AIME 2024 benchmark	general	text	0.93	93.3%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.93	93.3%	Self-reported
GPQA GPQA benchmark	general	text	0.85	84.6%	Self-reported
LiveCodeBench LiveCodeBench benchmark	code	text	0.79	79.4%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.78	78.0%	Self-reported

Resources

API Reference