
Qwen2.5 VL 72B Instruct
Multimodal
Zero-eval
#1DocVQA
#1Android Control Low_EM
#1OCRBench
+24 more
by Alibaba
About
Qwen2.5 VL 72B Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 66.9% across 30 benchmarks. It excels particularly in DocVQA (96.4%), Android Control Low_EM (93.7%), ChartQA (89.5%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Alibaba's latest advancement in AI technology.
Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
Specifications
Capabilities
Multimodal
License & Family
License
tongyi-qianwen
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
30 benchmarks
Average Score
66.9%
Best Score
96.4%
High Performers (80%+)
8Top Categories
general
69.6%
vision
58.6%
math
56.5%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
DocVQA
Rank #1 of 26
#1Qwen2.5 VL 72B Instruct
96.4%
#2Qwen2.5 VL 7B Instruct
95.7%
#3Qwen2.5-Omni-7B
95.2%
#4Claude 3.5 Sonnet
95.2%
Android Control Low_EM
Rank #1 of 3
#1Qwen2.5 VL 72B Instruct
93.7%
#2Qwen2.5 VL 32B Instruct
93.3%
#3Qwen2.5 VL 7B Instruct
91.4%
ChartQA
Rank #3 of 24
#1Llama 4 Maverick
90.0%
#2Claude 3.5 Sonnet
90.8%
#3Qwen2.5 VL 72B Instruct
89.5%
#4Nova Pro
89.2%
#5Llama 4 Scout
88.8%
#6Qwen2-VL-72B-Instruct
88.3%
OCRBench
Rank #1 of 7
#1Qwen2.5 VL 72B Instruct
88.5%
#2Qwen2-VL-72B-Instruct
87.7%
#3Qwen2.5 VL 7B Instruct
86.4%
#4Phi-4-multimodal-instruct
84.4%
AI2D
Rank #7 of 17
#4Llama 3.2 11B Instruct
91.1%
#5Llama 3.2 90B Instruct
92.3%
#6Mistral Small 3.2 24B Instruct
92.9%
#7Qwen2.5 VL 72B Instruct
88.4%
#8Grok-1.5V
88.3%
#9Gemma 3 27B
84.5%
#10Gemma 3 12B
84.2%
All Benchmark Results for Qwen2.5 VL 72B Instruct
Complete list of benchmark scores with detailed information
DocVQA DocVQA benchmark | vision | multimodal | 0.96 | 96.4% | Self-reported |
Android Control Low_EM Android Control Low_EM benchmark | general | text | 0.94 | 93.7% | Self-reported |
ChartQA ChartQA benchmark | general | multimodal | 0.90 | 89.5% | Self-reported |
OCRBench OCRBench benchmark | general | text | 0.89 | 88.5% | Self-reported |
AI2D AI2D benchmark | general | text | 0.88 | 88.4% | Self-reported |
MMBench MMBench benchmark | general | text | 0.88 | 88.0% | Self-reported |
ScreenSpot ScreenSpot benchmark | general | text | 0.87 | 87.1% | Self-reported |
AITZ_EM AITZ_EM benchmark | general | text | 0.83 | 83.2% | Self-reported |
CC-OCR CC-OCR benchmark | general | text | 0.80 | 79.8% | Self-reported |
EgoSchema EgoSchema benchmark | general | text | 0.76 | 76.2% | Self-reported |
Showing 1 to 10 of 30 benchmarks