Alibaba

Qwen2.5 VL 32B Instruct

Multimodal
Zero-eval
#1ScreenSpot
#1InfoVQA
#1Android Control High_EM
+16 more

by Alibaba

About

Qwen2.5 VL 32B Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 63.6% across 28 benchmarks. It excels particularly in DocVQA (94.8%), Android Control Low_EM (93.3%), HumanEval (91.5%). The model shows particular specialization in code tasks with an average performance of 87.8%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.

Timeline
AnnouncedFeb 28, 2025
ReleasedFeb 28, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

28 benchmarks
Average Score
63.6%
Best Score
94.8%
High Performers (80%+)
8

Top Categories

code
87.8%
math
65.1%
vision
64.0%
general
60.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

DocVQA

Rank #6 of 26
#3Mistral Small 3.2 24B Instruct
94.9%
#4Claude 3.5 Sonnet
95.2%
#5Qwen2.5-Omni-7B
95.2%
#6Qwen2.5 VL 32B Instruct
94.8%
#7Llama 4 Maverick
94.4%
#8Llama 4 Scout
94.4%
#9Grok-2
93.6%

Android Control Low_EM

Rank #2 of 3
#1Qwen2.5 VL 72B Instruct
93.7%
#2Qwen2.5 VL 32B Instruct
93.3%
#3Qwen2.5 VL 7B Instruct
91.4%

HumanEval

Rank #8 of 62
#5Mistral Large 2
92.0%
#6Claude 3.5 Sonnet
92.0%
#7o1-mini
92.4%
#8Qwen2.5 VL 32B Instruct
91.5%
#9GPT-4o
90.2%
#10Granite 3.3 8B Instruct
89.7%
#11Granite 3.3 8B Base
89.7%

ScreenSpot

Rank #1 of 3
#1Qwen2.5 VL 32B Instruct
88.5%
#2Qwen2.5 VL 72B Instruct
87.1%
#3Qwen2.5 VL 7B Instruct
84.7%

MBPP

Rank #6 of 31
#3Qwen2.5 32B Instruct
84.0%
#4Llama 3.1 Nemotron Nano 8B V1
84.6%
#5Qwen2.5 72B Instruct
88.2%
#6Qwen2.5 VL 32B Instruct
84.0%
#7Qwen2.5-Coder 7B Instruct
83.5%
#8Qwen2.5 14B Instruct
82.0%
#9Qwen3 235B A22B
81.4%
All Benchmark Results for Qwen2.5 VL 32B Instruct
Complete list of benchmark scores with detailed information
DocVQA
DocVQA benchmark
vision
multimodal
0.95
94.8%
Self-reported
Android Control Low_EM
Android Control Low_EM benchmark
general
text
0.93
93.3%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.92
91.5%
Self-reported
ScreenSpot
ScreenSpot benchmark
general
text
0.89
88.5%
Self-reported
MBPP
MBPP benchmark
code
text
84.00
84.0%
Self-reported
InfoVQA
InfoVQA benchmark
vision
multimodal
0.83
83.4%
Self-reported
AITZ_EM
AITZ_EM benchmark
general
text
0.83
83.1%
Self-reported
MATH
MATH benchmark
math
text
0.82
82.2%
Self-reported
MMLU
MMLU benchmark
general
text
0.78
78.4%
Self-reported
VideoMME w sub.
VideoMME w sub. benchmark
vision
video
0.78
77.9%
Self-reported
Showing 1 to 10 of 28 benchmarks