
Qwen2.5 VL 32B Instruct
Multimodal
Zero-eval
#1ScreenSpot
#1InfoVQA
#1Android Control High_EM
+16 more
by Alibaba
About
Qwen2.5 VL 32B Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 63.6% across 28 benchmarks. It excels particularly in DocVQA (94.8%), Android Control Low_EM (93.3%), HumanEval (91.5%). The model shows particular specialization in code tasks with an average performance of 87.8%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.
Timeline
AnnouncedFeb 28, 2025
ReleasedFeb 28, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
28 benchmarks
Average Score
63.6%
Best Score
94.8%
High Performers (80%+)
8Top Categories
code
87.8%
math
65.1%
vision
64.0%
general
60.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
DocVQA
Rank #6 of 26
#3Mistral Small 3.2 24B Instruct
94.9%
#4Claude 3.5 Sonnet
95.2%
#5Qwen2.5-Omni-7B
95.2%
#6Qwen2.5 VL 32B Instruct
94.8%
#7Llama 4 Maverick
94.4%
#8Llama 4 Scout
94.4%
#9Grok-2
93.6%
Android Control Low_EM
Rank #2 of 3
#1Qwen2.5 VL 72B Instruct
93.7%
#2Qwen2.5 VL 32B Instruct
93.3%
#3Qwen2.5 VL 7B Instruct
91.4%
HumanEval
Rank #8 of 62
#5Mistral Large 2
92.0%
#6Claude 3.5 Sonnet
92.0%
#7o1-mini
92.4%
#8Qwen2.5 VL 32B Instruct
91.5%
#9GPT-4o
90.2%
#10Granite 3.3 8B Instruct
89.7%
#11Granite 3.3 8B Base
89.7%
ScreenSpot
Rank #1 of 3
#1Qwen2.5 VL 32B Instruct
88.5%
#2Qwen2.5 VL 72B Instruct
87.1%
#3Qwen2.5 VL 7B Instruct
84.7%
MBPP
Rank #6 of 31
#3Qwen2.5 32B Instruct
84.0%
#4Llama 3.1 Nemotron Nano 8B V1
84.6%
#5Qwen2.5 72B Instruct
88.2%
#6Qwen2.5 VL 32B Instruct
84.0%
#7Qwen2.5-Coder 7B Instruct
83.5%
#8Qwen2.5 14B Instruct
82.0%
#9Qwen3 235B A22B
81.4%
All Benchmark Results for Qwen2.5 VL 32B Instruct
Complete list of benchmark scores with detailed information
DocVQA DocVQA benchmark | vision | multimodal | 0.95 | 94.8% | Self-reported |
Android Control Low_EM Android Control Low_EM benchmark | general | text | 0.93 | 93.3% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.92 | 91.5% | Self-reported |
ScreenSpot ScreenSpot benchmark | general | text | 0.89 | 88.5% | Self-reported |
MBPP MBPP benchmark | code | text | 84.00 | 84.0% | Self-reported |
InfoVQA InfoVQA benchmark | vision | multimodal | 0.83 | 83.4% | Self-reported |
AITZ_EM AITZ_EM benchmark | general | text | 0.83 | 83.1% | Self-reported |
MATH MATH benchmark | math | text | 0.82 | 82.2% | Self-reported |
MMLU MMLU benchmark | general | text | 0.78 | 78.4% | Self-reported |
VideoMME w sub. VideoMME w sub. benchmark | vision | video | 0.78 | 77.9% | Self-reported |
Showing 1 to 10 of 28 benchmarks