Qwen2.5 72B Instruct

Name: Qwen2.5 72B Instruct
Price: 1.2 USD
Rating: 77.4 (14 reviews)
Author: Alibaba

Zero-eval

#1MT-Bench

#1AlignBench

#3MBPP

by Alibaba

About

Qwen2.5 72B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 77.4% across 14 benchmarks. It excels particularly in GSM8k (95.8%), MT-Bench (93.5%), MBPP (88.2%). The model shows particular specialization in math tasks with an average performance of 89.5%. It supports a 139K token context window for handling large documents. The model is available through 4 API providers. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.35 -$1.20

Output (per 1M)$0.40 -$1.20

Providers4

Timeline

AnnouncedSep 19, 2024

ReleasedSep 19, 2024

Specifications

Training Tokens18.0T

License & Family

License

Qwen

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

14 benchmarks

Average Score

77.4%

Best Score

95.8%

High Performers (80%+)

Performance Metrics

Max Context Window

139.3K

Avg Throughput

54.0 tok/s

Avg Latency

0ms

Top Categories

math

89.5%

code

78.6%

general

74.1%

roleplay

72.9%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #9 of 46

#6Qwen2.5 32B Instruct

95.9%

#7Gemma 3 27B

95.9%

#8Claude 3.5 Sonnet

96.4%

#9Qwen2.5 72B Instruct

95.8%

#10DeepSeek-V2.5

95.1%

#11Claude 3 Opus

95.0%

#12Nova Pro

94.8%

MT-Bench

Rank #1 of 11

#1Qwen2.5 72B Instruct

93.5%

#2Llama-3.3 Nemotron Super 49B v1

91.7%

#3DeepSeek-V2.5

90.2%

#4Qwen2.5 7B Instruct

87.5%

MBPP

Rank #3 of 31

#1Qwen2.5-Coder 32B Instruct

90.2%

#2Llama-3.3 Nemotron Super 49B v1

91.3%

#3Qwen2.5 72B Instruct

88.2%

#4Llama 3.1 Nemotron Nano 8B V1

84.6%

#5Qwen2.5 32B Instruct

84.0%

#6Qwen2.5 VL 32B Instruct

84.0%

MMLU-Redux

Rank #7 of 13

#4Qwen3 235B A22B

87.4%

#5DeepSeek-V3

89.1%

#6Kimi K2 Instruct

92.7%

#7Qwen2.5 72B Instruct

86.8%

#8Qwen2.5 32B Instruct

83.9%

#9Qwen2.5 14B Instruct

80.0%

#10Qwen2.5-Coder 32B Instruct

77.5%

HumanEval

Rank #27 of 62

#24GPT-4 Turbo

87.1%

#25GPT-4o mini

87.2%

#26Gemma 3 27B

87.8%

#27Qwen2.5 72B Instruct

86.6%

#28Qwen2 72B Instruct

86.0%

#29Grok-2 mini

85.7%

#30Nova Lite

85.4%

All Benchmark Results for Qwen2.5 72B Instruct

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.96	95.8%	Self-reported
MT-Bench MT-Bench benchmark	roleplay	text	93.50	93.5%	Self-reported
MBPP MBPP benchmark	code	text	88.20	88.2%	Self-reported
MMLU-Redux MMLU-Redux benchmark	general	text	0.87	86.8%	Self-reported
HumanEval HumanEval benchmark	code	text	0.87	86.6%	Self-reported
IFEval IFEval benchmark	code	text	0.84	84.1%	Self-reported
MATH MATH benchmark	math	text	0.83	83.1%	Self-reported
AlignBench AlignBench benchmark	general	text	0.82	81.6%	Self-reported
Arena Hard Arena Hard benchmark	general	text	0.81	81.2%	Self-reported
MultiPL-E MultiPL-E benchmark	general	text	75.10	75.1%	Self-reported

Showing 1 to 10 of 14 benchmarks

Resources

API Reference Blog Post Repository Model Weights