
Grok-1.5
Zero-eval
by xAI
About
Grok-1.5 is a language model developed by xAI. It achieves strong performance with an average score of 63.9% across 9 benchmarks. It excels particularly in GSM8k (90.0%), DocVQA (85.6%), MMLU (81.3%). Released in 2024, it represents xAI's latest advancement in AI technology.
Timeline
AnnouncedMar 28, 2024
ReleasedMar 28, 2024
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
9 benchmarks
Average Score
63.9%
Best Score
90.0%
High Performers (80%+)
3Top Categories
code
74.1%
vision
69.6%
math
64.5%
general
56.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
GSM8k
Rank #26 of 46
#23Gemini 1.5 Pro
90.8%
#24Qwen2.5-Coder 32B Instruct
91.1%
#25Qwen2 72B Instruct
91.1%
#26Grok-1.5
90.0%
#27Gemma 3 4B
89.2%
#28Claude 3 Haiku
88.9%
#29Qwen2.5-Omni-7B
88.7%
DocVQA
Rank #25 of 26
#22Grok-1.5V
85.6%
#23Gemma 3 27B
86.6%
#24Gemma 3 12B
87.1%
#25Grok-1.5
85.6%
#26Gemma 3 4B
75.8%
MMLU
Rank #36 of 78
#33GPT-4o mini
82.0%
#34Qwen2 72B Instruct
82.3%
#35Qwen2.5 32B Instruct
83.3%
#36Grok-1.5
81.3%
#37Jamba 1.5 Large
81.2%
#38Mistral Small 3.1 24B Base
81.0%
#39Mistral Small 3 24B Base
80.7%
HumanEval
Rank #48 of 62
#45Gemini 1.5 Flash
74.3%
#46Gemma 3n E4B Instructed LiteRT Preview
75.0%
#47Gemma 3n E4B Instructed
75.0%
#48Grok-1.5
74.1%
#49Claude 3 Sonnet
73.0%
#50Llama 3.1 8B Instruct
72.6%
#51Pixtral-12B
72.0%
MMMU
Rank #44 of 52
#41Grok-1.5V
53.6%
#42Gemini 1.5 Flash 8B
53.7%
#43Phi-4-multimodal-instruct
55.1%
#44Grok-1.5
53.6%
#45Pixtral-12B
52.5%
#46DeepSeek VL2
51.1%
#47Llama 3.2 11B Instruct
50.7%
All Benchmark Results for Grok-1.5
Complete list of benchmark scores with detailed information
GSM8k GSM8k benchmark | math | text | 0.90 | 90.0% | Self-reported |
DocVQA DocVQA benchmark | vision | multimodal | 0.86 | 85.6% | Self-reported |
MMLU MMLU benchmark | general | text | 0.81 | 81.3% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.74 | 74.1% | Self-reported |
MMMU MMMU benchmark | vision | multimodal | 0.54 | 53.6% | Self-reported |
MathVista MathVista benchmark | math | text | 0.53 | 52.8% | Self-reported |
MMLU-Pro MMLU-Pro benchmark | general | text | 0.51 | 51.0% | Self-reported |
MATH MATH benchmark | math | text | 0.51 | 50.6% | Self-reported |
GPQA GPQA benchmark | general | text | 0.36 | 35.9% | Self-reported |
Resources