Alibaba

Qwen2.5-Omni-7B

Multimodal
Zero-eval
#1VocalSound
#1GiantSteps Tempo
#1MMBench-V1.1
+25 more

by Alibaba

About

Qwen2.5-Omni-7B is a multimodal language model developed by Alibaba. The model shows competitive results across 45 benchmarks. It excels particularly in DocVQA (95.2%), VocalSound (93.9%), GSM8k (88.7%). The model shows particular specialization in code tasks with an average performance of 76.0%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.

Timeline
AnnouncedMar 27, 2025
ReleasedMar 27, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

45 benchmarks
Average Score
59.2%
Best Score
95.2%
High Performers (80%+)
8

Top Categories

code
76.0%
vision
69.6%
math
63.3%
general
58.7%
roleplay
17.8%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

DocVQA

Rank #3 of 26
#1Qwen2.5 VL 7B Instruct
95.7%
#2Qwen2.5 VL 72B Instruct
96.4%
#3Qwen2.5-Omni-7B
95.2%
#4Claude 3.5 Sonnet
95.2%
#5Mistral Small 3.2 24B Instruct
94.9%
#6Qwen2.5 VL 32B Instruct
94.8%

VocalSound

Rank #1 of 1
#1Qwen2.5-Omni-7B
93.9%

GSM8k

Rank #29 of 46
#26Claude 3 Haiku
88.9%
#27Gemma 3 4B
89.2%
#28Grok-1.5
90.0%
#29Qwen2.5-Omni-7B
88.7%
#30Phi-3.5-MoE-instruct
88.7%
#31Phi 4 Mini
88.6%
#32Jamba 1.5 Large
87.0%

GiantSteps Tempo

Rank #1 of 1
#1Qwen2.5-Omni-7B
88.0%

ChartQA

Rank #14 of 24
#11Llama 3.2 90B Instruct
85.5%
#12GPT-4o
85.7%
#13DeepSeek VL2
86.0%
#14Qwen2.5-Omni-7B
85.3%
#15DeepSeek VL2 Small
84.5%
#16Llama 3.2 11B Instruct
83.4%
#17Pixtral-12B
81.8%
All Benchmark Results for Qwen2.5-Omni-7B
Complete list of benchmark scores with detailed information
DocVQA
DocVQA benchmark
vision
multimodal
0.95
95.2%
Self-reported
VocalSound
VocalSound benchmark
general
text
0.94
93.9%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.89
88.7%
Self-reported
GiantSteps Tempo
GiantSteps Tempo benchmark
general
text
0.88
88.0%
Self-reported
ChartQA
ChartQA benchmark
general
multimodal
0.85
85.3%
Self-reported
TextVQA
TextVQA benchmark
vision
multimodal
0.84
84.4%
Self-reported
AI2D
AI2D benchmark
general
text
0.83
83.2%
Self-reported
MMBench-V1.1
MMBench-V1.1 benchmark
general
text
0.82
81.8%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.79
78.7%
Self-reported
CRPErelation
CRPErelation benchmark
general
text
0.77
76.5%
Self-reported
Showing 1 to 10 of 45 benchmarks
...