Alibaba

Qwen2.5 7B Instruct

Zero-eval
#3AlignBench

by Alibaba

About

Qwen2.5 7B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 65.6% across 14 benchmarks. It excels particularly in GSM8k (91.6%), MT-Bench (87.5%), HumanEval (84.8%). The model shows particular specialization in math tasks with an average performance of 83.5%. It supports a 139K token context window for handling large documents. The model is available through 1 API provider. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.30 -$0.30
Output (per 1M)$0.30 -$0.30
Providers1
Timeline
AnnouncedSep 19, 2024
ReleasedSep 19, 2024
Specifications
Training Tokens18.0T
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

14 benchmarks
Average Score
65.6%
Best Score
91.6%
High Performers (80%+)
3

Performance Metrics

Max Context Window
139.3K
Avg Throughput
138.0 tok/s
Avg Latency
1ms

Top Categories

math
83.5%
code
66.0%
roleplay
61.7%
general
60.6%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

GSM8k

Rank #21 of 46
#18Kimi K2 Base
92.1%
#19Nova Micro
92.3%
#20Claude 3 Sonnet
92.3%
#21Qwen2.5 7B Instruct
91.6%
#22Llama 3.1 Nemotron 70B Instruct
91.4%
#23Qwen2 72B Instruct
91.1%
#24Qwen2.5-Coder 32B Instruct
91.1%

MT-Bench

Rank #4 of 11
#1DeepSeek-V2.5
90.2%
#2Llama-3.3 Nemotron Super 49B v1
91.7%
#3Qwen2.5 72B Instruct
93.5%
#4Qwen2.5 7B Instruct
87.5%
#5Mistral Large 2
86.3%
#6Qwen2 7B Instruct
84.1%
#7Mistral Small 3 24B Instruct
83.5%

HumanEval

Rank #33 of 62
#30Claude 3 Opus
84.9%
#31Gemma 3 12B
85.4%
#32Nova Lite
85.4%
#33Qwen2.5 7B Instruct
84.8%
#34Mistral Small 3 24B Instruct
84.8%
#35Gemini 1.5 Pro
84.1%
#36Qwen2.5 14B Instruct
83.5%

MBPP

Rank #12 of 31
#9Qwen2 72B Instruct
80.2%
#10Phi-3.5-MoE-instruct
80.8%
#11Qwen3 235B A22B
81.4%
#12Qwen2.5 7B Instruct
79.2%
#13Codestral-22B
78.2%
#14Llama 4 Maverick
77.6%
#15Gemini Diffusion
76.0%

MATH

Rank #22 of 63
#19Gemma 3 4B
75.6%
#20Grok-2
76.1%
#21GPT-4o
76.6%
#22Qwen2.5 7B Instruct
75.5%
#23DeepSeek-V2.5
74.7%
#24Llama 3.1 405B Instruct
73.8%
#25Nova Lite
73.3%
All Benchmark Results for Qwen2.5 7B Instruct
Complete list of benchmark scores with detailed information
GSM8k
GSM8k benchmark
math
text
0.92
91.6%
Self-reported
MT-Bench
MT-Bench benchmark
roleplay
text
87.50
87.5%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.85
84.8%
Self-reported
MBPP
MBPP benchmark
code
text
79.20
79.2%
Self-reported
MATH
MATH benchmark
math
text
0.76
75.5%
Self-reported
MMLU-Redux
MMLU-Redux benchmark
general
text
0.75
75.4%
Self-reported
AlignBench
AlignBench benchmark
general
text
0.73
73.3%
Self-reported
IFEval
IFEval benchmark
code
text
0.71
71.2%
Self-reported
MultiPL-E
MultiPL-E benchmark
general
text
70.40
70.4%
Self-reported
MMLU-Pro
MMLU-Pro benchmark
general
text
0.56
56.3%
Self-reported
Showing 1 to 10 of 14 benchmarks