Alibaba

Qwen2.5 VL 7B Instruct

Multimodal
Zero-eval
#1MobileMiniWob++_SR
#1MLVU
#1MMT-Bench
+21 more

by Alibaba

About

Qwen2.5 VL 7B Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 64.5% across 32 benchmarks. It excels particularly in DocVQA (95.7%), MobileMiniWob++_SR (91.4%), Android Control Low_EM (91.4%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.

Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

32 benchmarks
Average Score
64.5%
Best Score
95.7%
High Performers (80%+)
10

Top Categories

general
67.7%
roleplay
63.6%
vision
61.5%
math
46.6%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

DocVQA

Rank #2 of 26
#1Qwen2.5 VL 72B Instruct
96.4%
#2Qwen2.5 VL 7B Instruct
95.7%
#3Qwen2.5-Omni-7B
95.2%
#4Claude 3.5 Sonnet
95.2%
#5Mistral Small 3.2 24B Instruct
94.9%

MobileMiniWob++_SR

Rank #1 of 2
#1Qwen2.5 VL 7B Instruct
91.4%
#2Qwen2.5 VL 72B Instruct
68.0%

Android Control Low_EM

Rank #3 of 3
#1Qwen2.5 VL 32B Instruct
93.3%
#2Qwen2.5 VL 72B Instruct
93.7%
#3Qwen2.5 VL 7B Instruct
91.4%

ChartQA

Rank #9 of 24
#6Mistral Small 3.2 24B Instruct
87.4%
#7Pixtral Large
88.1%
#8Qwen2-VL-72B-Instruct
88.3%
#9Qwen2.5 VL 7B Instruct
87.3%
#10Nova Lite
86.8%
#11DeepSeek VL2
86.0%
#12GPT-4o
85.7%

OCRBench

Rank #3 of 7
#1Qwen2-VL-72B-Instruct
87.7%
#2Qwen2.5 VL 72B Instruct
88.5%
#3Qwen2.5 VL 7B Instruct
86.4%
#4Phi-4-multimodal-instruct
84.4%
#5DeepSeek VL2 Small
83.4%
#6DeepSeek VL2
81.1%
All Benchmark Results for Qwen2.5 VL 7B Instruct
Complete list of benchmark scores with detailed information
DocVQA
DocVQA benchmark
vision
multimodal
0.96
95.7%
Self-reported
MobileMiniWob++_SR
MobileMiniWob++_SR benchmark
general
text
0.91
91.4%
Self-reported
Android Control Low_EM
Android Control Low_EM benchmark
general
text
0.91
91.4%
Self-reported
ChartQA
ChartQA benchmark
general
multimodal
0.87
87.3%
Self-reported
OCRBench
OCRBench benchmark
general
text
0.86
86.4%
Self-reported
TextVQA
TextVQA benchmark
vision
multimodal
0.85
84.9%
Self-reported
ScreenSpot
ScreenSpot benchmark
general
text
0.85
84.7%
Self-reported
MMBench
MMBench benchmark
general
text
0.84
84.3%
Self-reported
InfoVQA
InfoVQA benchmark
vision
multimodal
0.83
82.6%
Self-reported
AITZ_EM
AITZ_EM benchmark
general
text
0.82
81.9%
Self-reported
Showing 1 to 10 of 32 benchmarks