Alibaba

Qwen2.5 VL 72B Instruct

Multimodal
Zero-eval
#1DocVQA
#1Android Control Low_EM
#1OCRBench
+24 more

by Alibaba

About

Qwen2.5 VL 72B Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 66.9% across 30 benchmarks. It excels particularly in DocVQA (96.4%), Android Control Low_EM (93.7%), ChartQA (89.5%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Alibaba's latest advancement in AI technology.

Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
Specifications
Capabilities
Multimodal
License & Family
License
tongyi-qianwen
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

30 benchmarks
Average Score
66.9%
Best Score
96.4%
High Performers (80%+)
8

Top Categories

general
69.6%
vision
58.6%
math
56.5%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

DocVQA

Rank #1 of 26
#1Qwen2.5 VL 72B Instruct
96.4%
#2Qwen2.5 VL 7B Instruct
95.7%
#3Qwen2.5-Omni-7B
95.2%
#4Claude 3.5 Sonnet
95.2%

Android Control Low_EM

Rank #1 of 3
#1Qwen2.5 VL 72B Instruct
93.7%
#2Qwen2.5 VL 32B Instruct
93.3%
#3Qwen2.5 VL 7B Instruct
91.4%

ChartQA

Rank #3 of 24
#1Llama 4 Maverick
90.0%
#2Claude 3.5 Sonnet
90.8%
#3Qwen2.5 VL 72B Instruct
89.5%
#4Nova Pro
89.2%
#5Llama 4 Scout
88.8%
#6Qwen2-VL-72B-Instruct
88.3%

OCRBench

Rank #1 of 7
#1Qwen2.5 VL 72B Instruct
88.5%
#2Qwen2-VL-72B-Instruct
87.7%
#3Qwen2.5 VL 7B Instruct
86.4%
#4Phi-4-multimodal-instruct
84.4%

AI2D

Rank #7 of 17
#4Llama 3.2 11B Instruct
91.1%
#5Llama 3.2 90B Instruct
92.3%
#6Mistral Small 3.2 24B Instruct
92.9%
#7Qwen2.5 VL 72B Instruct
88.4%
#8Grok-1.5V
88.3%
#9Gemma 3 27B
84.5%
#10Gemma 3 12B
84.2%
All Benchmark Results for Qwen2.5 VL 72B Instruct
Complete list of benchmark scores with detailed information
DocVQA
DocVQA benchmark
vision
multimodal
0.96
96.4%
Self-reported
Android Control Low_EM
Android Control Low_EM benchmark
general
text
0.94
93.7%
Self-reported
ChartQA
ChartQA benchmark
general
multimodal
0.90
89.5%
Self-reported
OCRBench
OCRBench benchmark
general
text
0.89
88.5%
Self-reported
AI2D
AI2D benchmark
general
text
0.88
88.4%
Self-reported
MMBench
MMBench benchmark
general
text
0.88
88.0%
Self-reported
ScreenSpot
ScreenSpot benchmark
general
text
0.87
87.1%
Self-reported
AITZ_EM
AITZ_EM benchmark
general
text
0.83
83.2%
Self-reported
CC-OCR
CC-OCR benchmark
general
text
0.80
79.8%
Self-reported
EgoSchema
EgoSchema benchmark
general
text
0.76
76.2%
Self-reported
Showing 1 to 10 of 30 benchmarks