Qwen2-VL-72B-Instruct

Name: Qwen2-VL-72B-Instruct
Rating: 75.8 (15 reviews)
Author: Alibaba

Multimodal

Zero-eval

#1DocVQAtest

#1VCR_en_easy

#1MMBench_test

+10 more

by Alibaba

About

Qwen2-VL-72B-Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 75.8% across 15 benchmarks. It excels particularly in DocVQAtest (96.5%), VCR_en_easy (91.9%), ChartQA (88.3%). The model shows particular specialization in general tasks with an average performance of 82.2%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Timeline

AnnouncedAug 29, 2024

ReleasedAug 29, 2024

Knowledge CutoffJun 30, 2023

Specifications

Capabilities

Multimodal

License & Family

License

tongyi-qianwen

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

15 benchmarks

Average Score

75.8%

Best Score

96.5%

High Performers (80%+)

Top Categories

general

82.2%

math

70.5%

vision

68.0%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

DocVQAtest

Rank #1 of 1

#1Qwen2-VL-72B-Instruct

96.5%

VCR_en_easy

Rank #1 of 1

#1Qwen2-VL-72B-Instruct

91.9%

ChartQA

Rank #6 of 24

#3Llama 4 Scout

88.8%

#4Nova Pro

89.2%

#5Qwen2.5 VL 72B Instruct

89.5%

#6Qwen2-VL-72B-Instruct

88.3%

#7Pixtral Large

88.1%

#8Mistral Small 3.2 24B Instruct

87.4%

#9Qwen2.5 VL 7B Instruct

87.3%

OCRBench

Rank #2 of 7

#1Qwen2.5 VL 72B Instruct

88.5%

#2Qwen2-VL-72B-Instruct

87.7%

#3Qwen2.5 VL 7B Instruct

86.4%

#4Phi-4-multimodal-instruct

84.4%

#5DeepSeek VL2 Small

83.4%

MMBench_test

Rank #1 of 1

#1Qwen2-VL-72B-Instruct

86.5%

All Benchmark Results for Qwen2-VL-72B-Instruct

Complete list of benchmark scores with detailed information


DocVQAtest DocVQAtest benchmark	vision	multimodal	0.96	96.5%	Self-reported
VCR_en_easy VCR_en_easy benchmark	general	text	0.92	91.9%	Self-reported
ChartQA ChartQA benchmark	general	multimodal	0.88	88.3%	Self-reported
OCRBench OCRBench benchmark	general	text	0.88	87.7%	Self-reported
MMBench_test MMBench_test benchmark	general	text	0.86	86.5%	Self-reported
TextVQA TextVQA benchmark	vision	multimodal	0.85	85.5%	Self-reported
InfoVQAtest InfoVQAtest benchmark	vision	multimodal	0.84	84.5%	Self-reported
EgoSchema EgoSchema benchmark	general	text	0.78	77.9%	Self-reported
RealWorldQA RealWorldQA benchmark	general	text	0.78	77.8%	Self-reported
MMVetGPT4Turbo MMVetGPT4Turbo benchmark	general	text	0.74	74.0%	Self-reported

Showing 1 to 10 of 15 benchmarks

Resources

API Reference Research Paper Blog Post Repository Model Weights