
Grok-2 mini
Multimodal
Zero-eval
by xAI
About
Grok-2 mini is a multimodal language model developed by xAI. It achieves strong performance with an average score of 74.0% across 8 benchmarks. It excels particularly in DocVQA (93.2%), MMLU (86.2%), HumanEval (85.7%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents xAI's latest advancement in AI technology.
Timeline
AnnouncedAug 13, 2024
ReleasedAug 13, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
8 benchmarks
Average Score
74.0%
Best Score
93.2%
High Performers (80%+)
3Top Categories
code
85.7%
vision
78.2%
math
70.5%
general
69.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
DocVQA
Rank #13 of 26
#10Pixtral Large
93.3%
#11DeepSeek VL2
93.3%
#12Nova Pro
93.5%
#13Grok-2 mini
93.2%
#14Phi-4-multimodal-instruct
93.2%
#15GPT-4o
92.8%
#16Nova Lite
92.4%
MMLU
Rank #22 of 78
#19GPT-4
86.4%
#20GPT-4 Turbo
86.5%
#21Claude 3 Opus
86.8%
#22Grok-2 mini
86.2%
#23Llama 3.2 90B Instruct
86.0%
#24Llama 3.3 70B Instruct
86.0%
#25Gemini 1.5 Pro
85.9%
HumanEval
Rank #29 of 62
#26Qwen2 72B Instruct
86.0%
#27Qwen2.5 72B Instruct
86.6%
#28GPT-4 Turbo
87.1%
#29Grok-2 mini
85.7%
#30Nova Lite
85.4%
#31Gemma 3 12B
85.4%
#32Claude 3 Opus
84.9%
MATH
Rank #26 of 63
#23Nova Lite
73.3%
#24Llama 3.1 405B Instruct
73.8%
#25DeepSeek-V2.5
74.7%
#26Grok-2 mini
73.0%
#27GPT-4 Turbo
72.6%
#28Qwen3 235B A22B
71.8%
#29Qwen2.5-Omni-7B
71.5%
MMLU-Pro
Rank #19 of 60
#16GPT-4o
72.6%
#17Llama 3.1 405B Instruct
73.3%
#18Llama 4 Scout
74.3%
#19Grok-2 mini
72.0%
#20Gemini 2.0 Flash-Lite
71.6%
#21Qwen2.5 72B Instruct
71.1%
#22Phi 4
70.4%
All Benchmark Results for Grok-2 mini
Complete list of benchmark scores with detailed information
DocVQA DocVQA benchmark | vision | multimodal | 0.93 | 93.2% | Self-reported |
MMLU MMLU benchmark | general | text | 0.86 | 86.2% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.86 | 85.7% | Self-reported |
MATH MATH benchmark | math | text | 0.73 | 73.0% | Self-reported |
MMLU-Pro MMLU-Pro benchmark | general | text | 0.72 | 72.0% | Self-reported |
MathVista MathVista benchmark | math | text | 0.68 | 68.1% | Self-reported |
MMMU MMMU benchmark | vision | multimodal | 0.63 | 63.2% | Self-reported |
GPQA GPQA benchmark | general | text | 0.51 | 51.0% | Self-reported |
Resources