
Qwen2.5 32B Instruct
Zero-eval
#1MMLU-STEM
#1MBPP+
#2TheoremQA
+2 more
by Alibaba
About
Qwen2.5 32B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 74.3% across 18 benchmarks. It excels particularly in GSM8k (95.9%), HumanEval (88.4%), HellaSwag (85.2%). The model shows particular specialization in math tasks with an average performance of 89.5%. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.
Timeline
AnnouncedSep 19, 2024
ReleasedSep 19, 2024
Specifications
Training Tokens18.0T
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
18 benchmarks
Average Score
74.3%
Best Score
95.9%
High Performers (80%+)
10Top Categories
math
89.5%
reasoning
79.2%
code
73.0%
general
71.3%
factuality
57.8%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
GSM8k
Rank #8 of 46
#5Gemma 3 27B
95.9%
#6Claude 3.5 Sonnet
96.4%
#7Claude 3.5 Sonnet
96.4%
#8Qwen2.5 32B Instruct
95.9%
#9Qwen2.5 72B Instruct
95.8%
#10DeepSeek-V2.5
95.1%
#11Claude 3 Opus
95.0%
HumanEval
Rank #20 of 62
#17Qwen2.5-Coder 7B Instruct
88.4%
#18Grok-2
88.4%
#19Llama 3.3 70B Instruct
88.4%
#20Qwen2.5 32B Instruct
88.4%
#21o1
88.1%
#22Claude 3.5 Haiku
88.1%
#23GPT-4.5
88.0%
HellaSwag
Rank #11 of 24
#8Llama 3.1 Nemotron 70B Instruct
85.6%
#9Claude 3 Haiku
85.9%
#10Gemma 2 27B
86.4%
#11Qwen2.5 32B Instruct
85.2%
#12Phi-3.5-MoE-instruct
83.8%
#13Mistral NeMo Instruct
83.5%
#14Qwen2.5-Coder 32B Instruct
83.0%
BBH
Rank #3 of 8
#1Nova Pro
86.9%
#2Qwen3 235B A22B
88.9%
#3Qwen2.5 32B Instruct
84.5%
#4DeepSeek-V2.5
84.3%
#5Nova Lite
82.4%
#6Qwen2 72B Instruct
82.4%
MBPP
Rank #5 of 31
#2Llama 3.1 Nemotron Nano 8B V1
84.6%
#3Qwen2.5 72B Instruct
88.2%
#4Qwen2.5-Coder 32B Instruct
90.2%
#5Qwen2.5 32B Instruct
84.0%
#6Qwen2.5 VL 32B Instruct
84.0%
#7Qwen2.5-Coder 7B Instruct
83.5%
#8Qwen2.5 14B Instruct
82.0%
All Benchmark Results for Qwen2.5 32B Instruct
Complete list of benchmark scores with detailed information
GSM8k GSM8k benchmark | math | text | 0.96 | 95.9% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.88 | 88.4% | Self-reported |
HellaSwag HellaSwag benchmark | reasoning | text | 0.85 | 85.2% | Self-reported |
BBH BBH benchmark | general | text | 0.84 | 84.5% | Self-reported |
MBPP MBPP benchmark | code | text | 84.00 | 84.0% | Self-reported |
MMLU-Redux MMLU-Redux benchmark | general | text | 0.84 | 83.9% | Self-reported |
MMLU MMLU benchmark | general | text | 0.83 | 83.3% | Self-reported |
MATH MATH benchmark | math | text | 0.83 | 83.1% | Self-reported |
Winogrande Winogrande benchmark | reasoning | text | 0.82 | 82.0% | Self-reported |
MMLU-STEM MMLU-STEM benchmark | general | text | 0.81 | 80.9% | Self-reported |
Showing 1 to 10 of 18 benchmarks