
Qwen2-VL-72B-Instruct
Multimodal
Zero-eval
#1DocVQAtest
#1VCR_en_easy
#1MMBench_test
+10 more
by Alibaba
About
Qwen2-VL-72B-Instruct is a multimodal language model developed by Alibaba. It achieves strong performance with an average score of 75.8% across 15 benchmarks. It excels particularly in DocVQAtest (96.5%), VCR_en_easy (91.9%), ChartQA (88.3%). The model shows particular specialization in general tasks with an average performance of 82.2%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Alibaba's latest advancement in AI technology.
Timeline
AnnouncedAug 29, 2024
ReleasedAug 29, 2024
Knowledge CutoffJun 30, 2023
Specifications
Capabilities
Multimodal
License & Family
License
tongyi-qianwen
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
15 benchmarks
Average Score
75.8%
Best Score
96.5%
High Performers (80%+)
7Top Categories
general
82.2%
math
70.5%
vision
68.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
DocVQAtest
Rank #1 of 1
#1Qwen2-VL-72B-Instruct
96.5%
VCR_en_easy
Rank #1 of 1
#1Qwen2-VL-72B-Instruct
91.9%
ChartQA
Rank #6 of 24
#3Llama 4 Scout
88.8%
#4Nova Pro
89.2%
#5Qwen2.5 VL 72B Instruct
89.5%
#6Qwen2-VL-72B-Instruct
88.3%
#7Pixtral Large
88.1%
#8Mistral Small 3.2 24B Instruct
87.4%
#9Qwen2.5 VL 7B Instruct
87.3%
OCRBench
Rank #2 of 7
#1Qwen2.5 VL 72B Instruct
88.5%
#2Qwen2-VL-72B-Instruct
87.7%
#3Qwen2.5 VL 7B Instruct
86.4%
#4Phi-4-multimodal-instruct
84.4%
#5DeepSeek VL2 Small
83.4%
MMBench_test
Rank #1 of 1
#1Qwen2-VL-72B-Instruct
86.5%
All Benchmark Results for Qwen2-VL-72B-Instruct
Complete list of benchmark scores with detailed information
DocVQAtest DocVQAtest benchmark | vision | multimodal | 0.96 | 96.5% | Self-reported |
VCR_en_easy VCR_en_easy benchmark | general | text | 0.92 | 91.9% | Self-reported |
ChartQA ChartQA benchmark | general | multimodal | 0.88 | 88.3% | Self-reported |
OCRBench OCRBench benchmark | general | text | 0.88 | 87.7% | Self-reported |
MMBench_test MMBench_test benchmark | general | text | 0.86 | 86.5% | Self-reported |
TextVQA TextVQA benchmark | vision | multimodal | 0.85 | 85.5% | Self-reported |
InfoVQAtest InfoVQAtest benchmark | vision | multimodal | 0.84 | 84.5% | Self-reported |
EgoSchema EgoSchema benchmark | general | text | 0.78 | 77.9% | Self-reported |
RealWorldQA RealWorldQA benchmark | general | text | 0.78 | 77.8% | Self-reported |
MMVetGPT4Turbo MMVetGPT4Turbo benchmark | general | text | 0.74 | 74.0% | Self-reported |
Showing 1 to 10 of 15 benchmarks