Grok-1.5

Zero-eval

by xAI

About

Grok-1.5 is a language model developed by xAI. It achieves strong performance with an average score of 63.9% across 9 benchmarks. It excels particularly in GSM8k (90.0%), DocVQA (85.6%), MMLU (81.3%). Released in 2024, it represents xAI's latest advancement in AI technology.

Timeline

AnnouncedMar 28, 2024

ReleasedMar 28, 2024

Specifications

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

9 benchmarks

Average Score

63.9%

Best Score

90.0%

High Performers (80%+)

Top Categories

code

74.1%

vision

69.6%

math

64.5%

general

56.1%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #26 of 46

#23Gemini 1.5 Pro

90.8%

#24Qwen2.5-Coder 32B Instruct

91.1%

#25Qwen2 72B Instruct

91.1%

#26Grok-1.5

90.0%

#27Gemma 3 4B

89.2%

#28Claude 3 Haiku

88.9%

#29Qwen2.5-Omni-7B

88.7%

DocVQA

Rank #25 of 26

#22Grok-1.5V

85.6%

#23Gemma 3 27B

86.6%

#24Gemma 3 12B

87.1%

#25Grok-1.5

85.6%

#26Gemma 3 4B

75.8%

MMLU

Rank #36 of 78

#33GPT-4o mini

82.0%

#34Qwen2 72B Instruct

82.3%

#35Qwen2.5 32B Instruct

83.3%

#36Grok-1.5

81.3%

#37Jamba 1.5 Large

81.2%

#38Mistral Small 3.1 24B Base

81.0%

#39Mistral Small 3 24B Base

80.7%

HumanEval

Rank #48 of 62

#45Gemini 1.5 Flash

74.3%

#46Gemma 3n E4B Instructed LiteRT Preview

75.0%

#47Gemma 3n E4B Instructed

75.0%

#48Grok-1.5

74.1%

#49Claude 3 Sonnet

73.0%

#50Llama 3.1 8B Instruct

72.6%

#51Pixtral-12B

72.0%

MMMU

Rank #44 of 52

#41Grok-1.5V

53.6%

#42Gemini 1.5 Flash 8B

53.7%

#43Phi-4-multimodal-instruct

55.1%

#44Grok-1.5

53.6%

#45Pixtral-12B

52.5%

#46DeepSeek VL2

51.1%

#47Llama 3.2 11B Instruct

50.7%

All Benchmark Results for Grok-1.5

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.90	90.0%	Self-reported
DocVQA DocVQA benchmark	vision	multimodal	0.86	85.6%	Self-reported
MMLU MMLU benchmark	general	text	0.81	81.3%	Self-reported
HumanEval HumanEval benchmark	code	text	0.74	74.1%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.54	53.6%	Self-reported
MathVista MathVista benchmark	math	text	0.53	52.8%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.51	51.0%	Self-reported
MATH MATH benchmark	math	text	0.51	50.6%	Self-reported
GPQA GPQA benchmark	general	text	0.36	35.9%	Self-reported

Resources

API Reference Blog Post Repository