xAI

Grok-2 mini

Multimodal
Zero-eval

by xAI

About

Grok-2 mini is a multimodal language model developed by xAI. It achieves strong performance with an average score of 74.0% across 8 benchmarks. It excels particularly in DocVQA (93.2%), MMLU (86.2%), HumanEval (85.7%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents xAI's latest advancement in AI technology.

Timeline
AnnouncedAug 13, 2024
ReleasedAug 13, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

8 benchmarks
Average Score
74.0%
Best Score
93.2%
High Performers (80%+)
3

Top Categories

code
85.7%
vision
78.2%
math
70.5%
general
69.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

DocVQA

Rank #13 of 26
#10Pixtral Large
93.3%
#11DeepSeek VL2
93.3%
#12Nova Pro
93.5%
#13Grok-2 mini
93.2%
#14Phi-4-multimodal-instruct
93.2%
#15GPT-4o
92.8%
#16Nova Lite
92.4%

MMLU

Rank #22 of 78
#19GPT-4
86.4%
#20GPT-4 Turbo
86.5%
#21Claude 3 Opus
86.8%
#22Grok-2 mini
86.2%
#23Llama 3.2 90B Instruct
86.0%
#24Llama 3.3 70B Instruct
86.0%
#25Gemini 1.5 Pro
85.9%

HumanEval

Rank #29 of 62
#26Qwen2 72B Instruct
86.0%
#27Qwen2.5 72B Instruct
86.6%
#28GPT-4 Turbo
87.1%
#29Grok-2 mini
85.7%
#30Nova Lite
85.4%
#31Gemma 3 12B
85.4%
#32Claude 3 Opus
84.9%

MATH

Rank #26 of 63
#23Nova Lite
73.3%
#24Llama 3.1 405B Instruct
73.8%
#25DeepSeek-V2.5
74.7%
#26Grok-2 mini
73.0%
#27GPT-4 Turbo
72.6%
#28Qwen3 235B A22B
71.8%
#29Qwen2.5-Omni-7B
71.5%

MMLU-Pro

Rank #19 of 60
#16GPT-4o
72.6%
#17Llama 3.1 405B Instruct
73.3%
#18Llama 4 Scout
74.3%
#19Grok-2 mini
72.0%
#20Gemini 2.0 Flash-Lite
71.6%
#21Qwen2.5 72B Instruct
71.1%
#22Phi 4
70.4%
All Benchmark Results for Grok-2 mini
Complete list of benchmark scores with detailed information
DocVQA
DocVQA benchmark
vision
multimodal
0.93
93.2%
Self-reported
MMLU
MMLU benchmark
general
text
0.86
86.2%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.86
85.7%
Self-reported
MATH
MATH benchmark
math
text
0.73
73.0%
Self-reported
MMLU-Pro
MMLU-Pro benchmark
general
text
0.72
72.0%
Self-reported
MathVista
MathVista benchmark
math
text
0.68
68.1%
Self-reported
MMMU
MMMU benchmark
vision
multimodal
0.63
63.2%
Self-reported
GPQA
GPQA benchmark
general
text
0.51
51.0%
Self-reported