Qwen2.5 VL 32B Instruct

Name: Qwen2.5 VL 32B Instruct
Rating: 63.6 (28 reviews)
Author: Alibaba

Multimodal

Zero-eval

#1ScreenSpot

#1InfoVQA

#1Android Control High_EM

+16 more

by Alibaba

About

Qwen2.5 VL 32B Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 63.6% across 28 benchmarks. It excels particularly in DocVQA (94.8%), Android Control Low_EM (93.3%), HumanEval (91.5%). The model shows particular specialization in code tasks with an average performance of 87.8%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.

Timeline

AnnouncedFeb 28, 2025

ReleasedFeb 28, 2025

Specifications

Capabilities

Multimodal

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

28 benchmarks

Average Score

63.6%

Best Score

94.8%

High Performers (80%+)

Top Categories

code

87.8%

math

65.1%

vision

64.0%

general

60.1%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

DocVQA

Rank #6 of 26

#3Mistral Small 3.2 24B Instruct

94.9%

#4Claude 3.5 Sonnet

95.2%

#5Qwen2.5-Omni-7B

95.2%

#6Qwen2.5 VL 32B Instruct

94.8%

#7Llama 4 Maverick

94.4%

#8Llama 4 Scout

94.4%

#9Grok-2

93.6%

Android Control Low_EM

Rank #2 of 3

#1Qwen2.5 VL 72B Instruct

93.7%

#2Qwen2.5 VL 32B Instruct

93.3%

#3Qwen2.5 VL 7B Instruct

91.4%

HumanEval

Rank #8 of 62

#5Mistral Large 2

92.0%

#6Claude 3.5 Sonnet

92.0%

#7o1-mini

92.4%

#8Qwen2.5 VL 32B Instruct

91.5%

#9GPT-4o

90.2%

#10Granite 3.3 8B Instruct

89.7%

#11Granite 3.3 8B Base

89.7%

ScreenSpot

Rank #1 of 3

#1Qwen2.5 VL 32B Instruct

88.5%

#2Qwen2.5 VL 72B Instruct

87.1%

#3Qwen2.5 VL 7B Instruct

84.7%

MBPP

Rank #6 of 31

#3Qwen2.5 32B Instruct

84.0%

#4Llama 3.1 Nemotron Nano 8B V1

84.6%

#5Qwen2.5 72B Instruct

88.2%

#6Qwen2.5 VL 32B Instruct

84.0%

#7Qwen2.5-Coder 7B Instruct

83.5%

#8Qwen2.5 14B Instruct

82.0%

#9Qwen3 235B A22B

81.4%

All Benchmark Results for Qwen2.5 VL 32B Instruct

Complete list of benchmark scores with detailed information


DocVQA DocVQA benchmark	vision	multimodal	0.95	94.8%	Self-reported
Android Control Low_EM Android Control Low_EM benchmark	general	text	0.93	93.3%	Self-reported
HumanEval HumanEval benchmark	code	text	0.92	91.5%	Self-reported
ScreenSpot ScreenSpot benchmark	general	text	0.89	88.5%	Self-reported
MBPP MBPP benchmark	code	text	84.00	84.0%	Self-reported
InfoVQA InfoVQA benchmark	vision	multimodal	0.83	83.4%	Self-reported
AITZ_EM AITZ_EM benchmark	general	text	0.83	83.1%	Self-reported
MATH MATH benchmark	math	text	0.82	82.2%	Self-reported
MMLU MMLU benchmark	general	text	0.78	78.4%	Self-reported
VideoMME w sub. VideoMME w sub. benchmark	vision	video	0.78	77.9%	Self-reported

Showing 1 to 10 of 28 benchmarks

Resources

Playground Research Paper Blog Post Repository Model Weights