
Qwen2.5 7B Instruct
Zero-eval
#3AlignBench
by Alibaba
About
Qwen2.5 7B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 65.6% across 14 benchmarks. It excels particularly in GSM8k (91.6%), MT-Bench (87.5%), HumanEval (84.8%). The model shows particular specialization in math tasks with an average performance of 83.5%. It supports a 139K token context window for handling large documents. The model is available through 1 API provider. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.30 -$0.30
Output (per 1M)$0.30 -$0.30
Providers1
Timeline
AnnouncedSep 19, 2024
ReleasedSep 19, 2024
Specifications
Training Tokens18.0T
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
14 benchmarks
Average Score
65.6%
Best Score
91.6%
High Performers (80%+)
3Performance Metrics
Max Context Window
139.3KAvg Throughput
138.0 tok/sAvg Latency
1msTop Categories
math
83.5%
code
66.0%
roleplay
61.7%
general
60.6%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
GSM8k
Rank #21 of 46
#18Kimi K2 Base
92.1%
#19Nova Micro
92.3%
#20Claude 3 Sonnet
92.3%
#21Qwen2.5 7B Instruct
91.6%
#22Llama 3.1 Nemotron 70B Instruct
91.4%
#23Qwen2 72B Instruct
91.1%
#24Qwen2.5-Coder 32B Instruct
91.1%
MT-Bench
Rank #4 of 11
#1DeepSeek-V2.5
90.2%
#2Llama-3.3 Nemotron Super 49B v1
91.7%
#3Qwen2.5 72B Instruct
93.5%
#4Qwen2.5 7B Instruct
87.5%
#5Mistral Large 2
86.3%
#6Qwen2 7B Instruct
84.1%
#7Mistral Small 3 24B Instruct
83.5%
HumanEval
Rank #33 of 62
#30Claude 3 Opus
84.9%
#31Gemma 3 12B
85.4%
#32Nova Lite
85.4%
#33Qwen2.5 7B Instruct
84.8%
#34Mistral Small 3 24B Instruct
84.8%
#35Gemini 1.5 Pro
84.1%
#36Qwen2.5 14B Instruct
83.5%
MBPP
Rank #12 of 31
#9Qwen2 72B Instruct
80.2%
#10Phi-3.5-MoE-instruct
80.8%
#11Qwen3 235B A22B
81.4%
#12Qwen2.5 7B Instruct
79.2%
#13Codestral-22B
78.2%
#14Llama 4 Maverick
77.6%
#15Gemini Diffusion
76.0%
MATH
Rank #22 of 63
#19Gemma 3 4B
75.6%
#20Grok-2
76.1%
#21GPT-4o
76.6%
#22Qwen2.5 7B Instruct
75.5%
#23DeepSeek-V2.5
74.7%
#24Llama 3.1 405B Instruct
73.8%
#25Nova Lite
73.3%
All Benchmark Results for Qwen2.5 7B Instruct
Complete list of benchmark scores with detailed information
GSM8k GSM8k benchmark | math | text | 0.92 | 91.6% | Self-reported |
MT-Bench MT-Bench benchmark | roleplay | text | 87.50 | 87.5% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.85 | 84.8% | Self-reported |
MBPP MBPP benchmark | code | text | 79.20 | 79.2% | Self-reported |
MATH MATH benchmark | math | text | 0.76 | 75.5% | Self-reported |
MMLU-Redux MMLU-Redux benchmark | general | text | 0.75 | 75.4% | Self-reported |
AlignBench AlignBench benchmark | general | text | 0.73 | 73.3% | Self-reported |
IFEval IFEval benchmark | code | text | 0.71 | 71.2% | Self-reported |
MultiPL-E MultiPL-E benchmark | general | text | 70.40 | 70.4% | Self-reported |
MMLU-Pro MMLU-Pro benchmark | general | text | 0.56 | 56.3% | Self-reported |
Showing 1 to 10 of 14 benchmarks