
Grok-1.5V
Multimodal
Zero-eval
#3RealWorldQA
by xAI
About
Grok-1.5V is a multimodal language model developed by xAI. It achieves strong performance with an average score of 71.9% across 7 benchmarks. It excels particularly in AI2D (88.3%), DocVQA (85.6%), TextVQA (78.1%). The model shows particular specialization in general tasks with an average performance of 77.7%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents xAI's latest advancement in AI technology.
Timeline
AnnouncedApr 12, 2024
ReleasedApr 12, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
7 benchmarks
Average Score
71.9%
Best Score
88.3%
High Performers (80%+)
2Top Categories
general
77.7%
vision
72.4%
math
52.8%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
AI2D
Rank #8 of 17
#5Qwen2.5 VL 72B Instruct
88.4%
#6Llama 3.2 11B Instruct
91.1%
#7Llama 3.2 90B Instruct
92.3%
#8Grok-1.5V
88.3%
#9Gemma 3 27B
84.5%
#10Gemma 3 12B
84.2%
#11Qwen2.5-Omni-7B
83.2%
DocVQA
Rank #24 of 26
#21Gemma 3 27B
86.6%
#22Gemma 3 12B
87.1%
#23Llama 3.2 11B Instruct
88.4%
#24Grok-1.5V
85.6%
#25Grok-1.5
85.6%
#26Gemma 3 4B
75.8%
TextVQA
Rank #9 of 15
#6Nova Lite
80.2%
#7DeepSeek VL2 Tiny
80.7%
#8Nova Pro
81.5%
#9Grok-1.5V
78.1%
#10Phi-4-multimodal-instruct
75.6%
#11Llama 3.2 90B Instruct
73.5%
#12Phi-3.5-vision-instruct
72.0%
ChartQA
Rank #22 of 24
#19Gemma 3 27B
78.0%
#20DeepSeek VL2 Tiny
81.0%
#21Phi-4-multimodal-instruct
81.4%
#22Grok-1.5V
76.1%
#23Gemma 3 12B
75.7%
#24Gemma 3 4B
68.8%
RealWorldQA
Rank #3 of 6
#1Qwen2.5-Omni-7B
70.3%
#2Qwen2-VL-72B-Instruct
77.8%
#3Grok-1.5V
68.7%
#4DeepSeek VL2
68.4%
#5DeepSeek VL2 Small
65.4%
#6DeepSeek VL2 Tiny
64.2%
All Benchmark Results for Grok-1.5V
Complete list of benchmark scores with detailed information
AI2D AI2D benchmark | general | text | 0.88 | 88.3% | Self-reported |
DocVQA DocVQA benchmark | vision | multimodal | 0.86 | 85.6% | Self-reported |
TextVQA TextVQA benchmark | vision | multimodal | 0.78 | 78.1% | Self-reported |
ChartQA ChartQA benchmark | general | multimodal | 0.76 | 76.1% | Self-reported |
RealWorldQA RealWorldQA benchmark | general | text | 0.69 | 68.7% | Self-reported |
MMMU MMMU benchmark | vision | multimodal | 0.54 | 53.6% | Self-reported |
MathVista MathVista benchmark | math | text | 0.53 | 52.8% | Self-reported |
Resources