xAI

Grok-1.5

Zero-eval

by xAI

About

Grok-1.5 is a language model developed by xAI. It achieves strong performance with an average score of 63.9% across 9 benchmarks. It excels particularly in GSM8k (90.0%), DocVQA (85.6%), MMLU (81.3%). Released in 2024, it represents xAI's latest advancement in AI technology.

Timeline
AnnouncedMar 28, 2024
ReleasedMar 28, 2024
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

9 benchmarks
Average Score
63.9%
Best Score
90.0%
High Performers (80%+)
3

Top Categories

code
74.1%
vision
69.6%
math
64.5%
general
56.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

GSM8k

Rank #26 of 46
#23Gemini 1.5 Pro
90.8%
#24Qwen2.5-Coder 32B Instruct
91.1%
#25Qwen2 72B Instruct
91.1%
#26Grok-1.5
90.0%
#27Gemma 3 4B
89.2%
#28Claude 3 Haiku
88.9%
#29Qwen2.5-Omni-7B
88.7%

DocVQA

Rank #25 of 26
#22Grok-1.5V
85.6%
#23Gemma 3 27B
86.6%
#24Gemma 3 12B
87.1%
#25Grok-1.5
85.6%
#26Gemma 3 4B
75.8%

MMLU

Rank #36 of 78
#33GPT-4o mini
82.0%
#34Qwen2 72B Instruct
82.3%
#35Qwen2.5 32B Instruct
83.3%
#36Grok-1.5
81.3%
#37Jamba 1.5 Large
81.2%
#38Mistral Small 3.1 24B Base
81.0%
#39Mistral Small 3 24B Base
80.7%

HumanEval

Rank #48 of 62
#45Gemini 1.5 Flash
74.3%
#46Gemma 3n E4B Instructed LiteRT Preview
75.0%
#47Gemma 3n E4B Instructed
75.0%
#48Grok-1.5
74.1%
#49Claude 3 Sonnet
73.0%
#50Llama 3.1 8B Instruct
72.6%
#51Pixtral-12B
72.0%

MMMU

Rank #44 of 52
#41Grok-1.5V
53.6%
#42Gemini 1.5 Flash 8B
53.7%
#43Phi-4-multimodal-instruct
55.1%
#44Grok-1.5
53.6%
#45Pixtral-12B
52.5%
#46DeepSeek VL2
51.1%
#47Llama 3.2 11B Instruct
50.7%
All Benchmark Results for Grok-1.5
Complete list of benchmark scores with detailed information
GSM8k
GSM8k benchmark
math
text
0.90
90.0%
Self-reported
DocVQA
DocVQA benchmark
vision
multimodal
0.86
85.6%
Self-reported
MMLU
MMLU benchmark
general
text
0.81
81.3%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.74
74.1%
Self-reported
MMMU
MMMU benchmark
vision
multimodal
0.54
53.6%
Self-reported
MathVista
MathVista benchmark
math
text
0.53
52.8%
Self-reported
MMLU-Pro
MMLU-Pro benchmark
general
text
0.51
51.0%
Self-reported
MATH
MATH benchmark
math
text
0.51
50.6%
Self-reported
GPQA
GPQA benchmark
general
text
0.36
35.9%
Self-reported