Grok-2 mini

Multimodal

Zero-eval

by xAI

About

Grok-2 mini is a multimodal language model developed by xAI. It achieves strong performance with an average score of 74.0% across 8 benchmarks. It excels particularly in DocVQA (93.2%), MMLU (86.2%), HumanEval (85.7%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents xAI's latest advancement in AI technology.

Timeline

AnnouncedAug 13, 2024

ReleasedAug 13, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

8 benchmarks

Average Score

74.0%

Best Score

93.2%

High Performers (80%+)

Top Categories

code

85.7%

vision

78.2%

math

70.5%

general

69.7%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

DocVQA

Rank #13 of 26

#10Pixtral Large

93.3%

#11DeepSeek VL2

93.3%

#12Nova Pro

93.5%

#13Grok-2 mini

93.2%

#14Phi-4-multimodal-instruct

93.2%

#15GPT-4o

92.8%

#16Nova Lite

92.4%

MMLU

Rank #22 of 78

#19GPT-4

86.4%

#20GPT-4 Turbo

86.5%

#21Claude 3 Opus

86.8%

#22Grok-2 mini

86.2%

#23Llama 3.2 90B Instruct

86.0%

#24Llama 3.3 70B Instruct

86.0%

#25Gemini 1.5 Pro

85.9%

HumanEval

Rank #29 of 62

#26Qwen2 72B Instruct

86.0%

#27Qwen2.5 72B Instruct

86.6%

#28GPT-4 Turbo

87.1%

#29Grok-2 mini

85.7%

#30Nova Lite

85.4%

#31Gemma 3 12B

85.4%

#32Claude 3 Opus

84.9%

MATH

Rank #26 of 63

#23Nova Lite

73.3%

#24Llama 3.1 405B Instruct

73.8%

#25DeepSeek-V2.5

74.7%

#26Grok-2 mini

73.0%

#27GPT-4 Turbo

72.6%

#28Qwen3 235B A22B

71.8%

#29Qwen2.5-Omni-7B

71.5%

MMLU-Pro

Rank #19 of 60

#16GPT-4o

72.6%

#17Llama 3.1 405B Instruct

73.3%

#18Llama 4 Scout

74.3%

#19Grok-2 mini

72.0%

#20Gemini 2.0 Flash-Lite

71.6%

#21Qwen2.5 72B Instruct

71.1%

#22Phi 4

70.4%

All Benchmark Results for Grok-2 mini

Complete list of benchmark scores with detailed information


DocVQA DocVQA benchmark	vision	multimodal	0.93	93.2%	Self-reported
MMLU MMLU benchmark	general	text	0.86	86.2%	Self-reported
HumanEval HumanEval benchmark	code	text	0.86	85.7%	Self-reported
MATH MATH benchmark	math	text	0.73	73.0%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.72	72.0%	Self-reported
MathVista MathVista benchmark	math	text	0.68	68.1%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.63	63.2%	Self-reported
GPQA GPQA benchmark	general	text	0.51	51.0%	Self-reported

Resources

API Reference Blog Post