
Qwen2.5 72B Instruct
Zero-eval
#1MT-Bench
#1AlignBench
#3MBPP
by Alibaba
About
Qwen2.5 72B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 77.4% across 14 benchmarks. It excels particularly in GSM8k (95.8%), MT-Bench (93.5%), MBPP (88.2%). The model shows particular specialization in math tasks with an average performance of 89.5%. It supports a 139K token context window for handling large documents. The model is available through 4 API providers. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.35 -$1.20
Output (per 1M)$0.40 -$1.20
Providers4
Timeline
AnnouncedSep 19, 2024
ReleasedSep 19, 2024
Specifications
Training Tokens18.0T
License & Family
License
Qwen
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
14 benchmarks
Average Score
77.4%
Best Score
95.8%
High Performers (80%+)
9Performance Metrics
Max Context Window
139.3KAvg Throughput
54.0 tok/sAvg Latency
0msTop Categories
math
89.5%
code
78.6%
general
74.1%
roleplay
72.9%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
GSM8k
Rank #9 of 46
#6Qwen2.5 32B Instruct
95.9%
#7Gemma 3 27B
95.9%
#8Claude 3.5 Sonnet
96.4%
#9Qwen2.5 72B Instruct
95.8%
#10DeepSeek-V2.5
95.1%
#11Claude 3 Opus
95.0%
#12Nova Pro
94.8%
MT-Bench
Rank #1 of 11
#1Qwen2.5 72B Instruct
93.5%
#2Llama-3.3 Nemotron Super 49B v1
91.7%
#3DeepSeek-V2.5
90.2%
#4Qwen2.5 7B Instruct
87.5%
MBPP
Rank #3 of 31
#1Qwen2.5-Coder 32B Instruct
90.2%
#2Llama-3.3 Nemotron Super 49B v1
91.3%
#3Qwen2.5 72B Instruct
88.2%
#4Llama 3.1 Nemotron Nano 8B V1
84.6%
#5Qwen2.5 32B Instruct
84.0%
#6Qwen2.5 VL 32B Instruct
84.0%
MMLU-Redux
Rank #7 of 13
#4Qwen3 235B A22B
87.4%
#5DeepSeek-V3
89.1%
#6Kimi K2 Instruct
92.7%
#7Qwen2.5 72B Instruct
86.8%
#8Qwen2.5 32B Instruct
83.9%
#9Qwen2.5 14B Instruct
80.0%
#10Qwen2.5-Coder 32B Instruct
77.5%
HumanEval
Rank #27 of 62
#24GPT-4 Turbo
87.1%
#25GPT-4o mini
87.2%
#26Gemma 3 27B
87.8%
#27Qwen2.5 72B Instruct
86.6%
#28Qwen2 72B Instruct
86.0%
#29Grok-2 mini
85.7%
#30Nova Lite
85.4%
All Benchmark Results for Qwen2.5 72B Instruct
Complete list of benchmark scores with detailed information
GSM8k GSM8k benchmark | math | text | 0.96 | 95.8% | Self-reported |
MT-Bench MT-Bench benchmark | roleplay | text | 93.50 | 93.5% | Self-reported |
MBPP MBPP benchmark | code | text | 88.20 | 88.2% | Self-reported |
MMLU-Redux MMLU-Redux benchmark | general | text | 0.87 | 86.8% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.87 | 86.6% | Self-reported |
IFEval IFEval benchmark | code | text | 0.84 | 84.1% | Self-reported |
MATH MATH benchmark | math | text | 0.83 | 83.1% | Self-reported |
AlignBench AlignBench benchmark | general | text | 0.82 | 81.6% | Self-reported |
Arena Hard Arena Hard benchmark | general | text | 0.81 | 81.2% | Self-reported |
MultiPL-E MultiPL-E benchmark | general | text | 75.10 | 75.1% | Self-reported |
Showing 1 to 10 of 14 benchmarks