Grok-1.5V

Multimodal

Zero-eval

#3RealWorldQA

by xAI

About

Grok-1.5V is a multimodal language model developed by xAI. It achieves strong performance with an average score of 71.9% across 7 benchmarks. It excels particularly in AI2D (88.3%), DocVQA (85.6%), TextVQA (78.1%). The model shows particular specialization in general tasks with an average performance of 77.7%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents xAI's latest advancement in AI technology.

Timeline

AnnouncedApr 12, 2024

ReleasedApr 12, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

7 benchmarks

Average Score

71.9%

Best Score

88.3%

High Performers (80%+)

Top Categories

general

77.7%

vision

72.4%

math

52.8%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

AI2D

Rank #8 of 17

#5Qwen2.5 VL 72B Instruct

88.4%

#6Llama 3.2 11B Instruct

91.1%

#7Llama 3.2 90B Instruct

92.3%

#8Grok-1.5V

88.3%

#9Gemma 3 27B

84.5%

#10Gemma 3 12B

84.2%

#11Qwen2.5-Omni-7B

83.2%

DocVQA

Rank #24 of 26

#21Gemma 3 27B

86.6%

#22Gemma 3 12B

87.1%

#23Llama 3.2 11B Instruct

88.4%

#24Grok-1.5V

85.6%

#25Grok-1.5

85.6%

#26Gemma 3 4B

75.8%

TextVQA

Rank #9 of 15

#6Nova Lite

80.2%

#7DeepSeek VL2 Tiny

80.7%

#8Nova Pro

81.5%

#9Grok-1.5V

78.1%

#10Phi-4-multimodal-instruct

75.6%

#11Llama 3.2 90B Instruct

73.5%

#12Phi-3.5-vision-instruct

72.0%

ChartQA

Rank #22 of 24

#19Gemma 3 27B

78.0%

#20DeepSeek VL2 Tiny

81.0%

#21Phi-4-multimodal-instruct

81.4%

#22Grok-1.5V

76.1%

#23Gemma 3 12B

75.7%

#24Gemma 3 4B

68.8%

RealWorldQA

Rank #3 of 6

#1Qwen2.5-Omni-7B

70.3%

#2Qwen2-VL-72B-Instruct

77.8%

#3Grok-1.5V

68.7%

#4DeepSeek VL2

68.4%

#5DeepSeek VL2 Small

65.4%

#6DeepSeek VL2 Tiny

64.2%

All Benchmark Results for Grok-1.5V

Complete list of benchmark scores with detailed information


AI2D AI2D benchmark	general	text	0.88	88.3%	Self-reported
DocVQA DocVQA benchmark	vision	multimodal	0.86	85.6%	Self-reported
TextVQA TextVQA benchmark	vision	multimodal	0.78	78.1%	Self-reported
ChartQA ChartQA benchmark	general	multimodal	0.76	76.1%	Self-reported
RealWorldQA RealWorldQA benchmark	general	text	0.69	68.7%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.54	53.6%	Self-reported
MathVista MathVista benchmark	math	text	0.53	52.8%	Self-reported

Resources

API Reference Blog Post