
Qwen2.5 VL 7B Instruct
Multimodal
Zero-eval
#1MobileMiniWob++_SR
#1MLVU
#1MMT-Bench
+21 more
by Alibaba
About
Qwen2.5 VL 7B Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 64.5% across 32 benchmarks. It excels particularly in DocVQA (95.7%), MobileMiniWob++_SR (91.4%), Android Control Low_EM (91.4%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.
Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
32 benchmarks
Average Score
64.5%
Best Score
95.7%
High Performers (80%+)
10Top Categories
general
67.7%
roleplay
63.6%
vision
61.5%
math
46.6%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
DocVQA
Rank #2 of 26
#1Qwen2.5 VL 72B Instruct
96.4%
#2Qwen2.5 VL 7B Instruct
95.7%
#3Qwen2.5-Omni-7B
95.2%
#4Claude 3.5 Sonnet
95.2%
#5Mistral Small 3.2 24B Instruct
94.9%
MobileMiniWob++_SR
Rank #1 of 2
#1Qwen2.5 VL 7B Instruct
91.4%
#2Qwen2.5 VL 72B Instruct
68.0%
Android Control Low_EM
Rank #3 of 3
#1Qwen2.5 VL 32B Instruct
93.3%
#2Qwen2.5 VL 72B Instruct
93.7%
#3Qwen2.5 VL 7B Instruct
91.4%
ChartQA
Rank #9 of 24
#6Mistral Small 3.2 24B Instruct
87.4%
#7Pixtral Large
88.1%
#8Qwen2-VL-72B-Instruct
88.3%
#9Qwen2.5 VL 7B Instruct
87.3%
#10Nova Lite
86.8%
#11DeepSeek VL2
86.0%
#12GPT-4o
85.7%
OCRBench
Rank #3 of 7
#1Qwen2-VL-72B-Instruct
87.7%
#2Qwen2.5 VL 72B Instruct
88.5%
#3Qwen2.5 VL 7B Instruct
86.4%
#4Phi-4-multimodal-instruct
84.4%
#5DeepSeek VL2 Small
83.4%
#6DeepSeek VL2
81.1%
All Benchmark Results for Qwen2.5 VL 7B Instruct
Complete list of benchmark scores with detailed information
DocVQA DocVQA benchmark | vision | multimodal | 0.96 | 95.7% | Self-reported |
MobileMiniWob++_SR MobileMiniWob++_SR benchmark | general | text | 0.91 | 91.4% | Self-reported |
Android Control Low_EM Android Control Low_EM benchmark | general | text | 0.91 | 91.4% | Self-reported |
ChartQA ChartQA benchmark | general | multimodal | 0.87 | 87.3% | Self-reported |
OCRBench OCRBench benchmark | general | text | 0.86 | 86.4% | Self-reported |
TextVQA TextVQA benchmark | vision | multimodal | 0.85 | 84.9% | Self-reported |
ScreenSpot ScreenSpot benchmark | general | text | 0.85 | 84.7% | Self-reported |
MMBench MMBench benchmark | general | text | 0.84 | 84.3% | Self-reported |
InfoVQA InfoVQA benchmark | vision | multimodal | 0.83 | 82.6% | Self-reported |
AITZ_EM AITZ_EM benchmark | general | text | 0.82 | 81.9% | Self-reported |
Showing 1 to 10 of 32 benchmarks