Alibaba

Qwen2-VL-72B-Instruct

Multimodal
Zero-eval
#1DocVQAtest
#1VCR_en_easy
#1MMBench_test
+10 more

by Alibaba

About

Qwen2-VL-72B-Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 75.8% across 15 benchmarks. It excels particularly in DocVQAtest (96.5%), VCR_en_easy (91.9%), ChartQA (88.3%). The model shows particular specialization in general tasks with an average performance of 82.2%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Timeline
AnnouncedAug 29, 2024
ReleasedAug 29, 2024
Knowledge CutoffJun 30, 2023
Specifications
Capabilities
Multimodal
License & Family
License
tongyi-qianwen
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

15 benchmarks
Average Score
75.8%
Best Score
96.5%
High Performers (80%+)
7

Top Categories

general
82.2%
math
70.5%
vision
68.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

DocVQAtest

Rank #1 of 1
#1Qwen2-VL-72B-Instruct
96.5%

VCR_en_easy

Rank #1 of 1
#1Qwen2-VL-72B-Instruct
91.9%

ChartQA

Rank #6 of 24
#3Llama 4 Scout
88.8%
#4Nova Pro
89.2%
#5Qwen2.5 VL 72B Instruct
89.5%
#6Qwen2-VL-72B-Instruct
88.3%
#7Pixtral Large
88.1%
#8Mistral Small 3.2 24B Instruct
87.4%
#9Qwen2.5 VL 7B Instruct
87.3%

OCRBench

Rank #2 of 7
#1Qwen2.5 VL 72B Instruct
88.5%
#2Qwen2-VL-72B-Instruct
87.7%
#3Qwen2.5 VL 7B Instruct
86.4%
#4Phi-4-multimodal-instruct
84.4%
#5DeepSeek VL2 Small
83.4%

MMBench_test

Rank #1 of 1
#1Qwen2-VL-72B-Instruct
86.5%
All Benchmark Results for Qwen2-VL-72B-Instruct
Complete list of benchmark scores with detailed information
DocVQAtest
DocVQAtest benchmark
vision
multimodal
0.96
96.5%
Self-reported
VCR_en_easy
VCR_en_easy benchmark
general
text
0.92
91.9%
Self-reported
ChartQA
ChartQA benchmark
general
multimodal
0.88
88.3%
Self-reported
OCRBench
OCRBench benchmark
general
text
0.88
87.7%
Self-reported
MMBench_test
MMBench_test benchmark
general
text
0.86
86.5%
Self-reported
TextVQA
TextVQA benchmark
vision
multimodal
0.85
85.5%
Self-reported
InfoVQAtest
InfoVQAtest benchmark
vision
multimodal
0.84
84.5%
Self-reported
EgoSchema
EgoSchema benchmark
general
text
0.78
77.9%
Self-reported
RealWorldQA
RealWorldQA benchmark
general
text
0.78
77.8%
Self-reported
MMVetGPT4Turbo
MMVetGPT4Turbo benchmark
general
text
0.74
74.0%
Self-reported
Showing 1 to 10 of 15 benchmarks