
Qwen2.5 14B Instruct
Zero-eval
#2MMLU-STEM
#2MBPP+
by Alibaba
About
Qwen2.5 14B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 70.0% across 16 benchmarks. It excels particularly in GSM8k (94.8%), HumanEval (83.5%), MBPP (82.0%). The model shows particular specialization in math tasks with an average performance of 87.4%. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.
Timeline
AnnouncedSep 19, 2024
ReleasedSep 19, 2024
Specifications
Training Tokens18.0T
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
16 benchmarks
Average Score
70.0%
Best Score
94.8%
High Performers (80%+)
5Top Categories
math
87.4%
code
70.0%
general
67.4%
reasoning
67.3%
factuality
58.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
GSM8k
Rank #13 of 46
#10Nova Pro
94.8%
#11Claude 3 Opus
95.0%
#12DeepSeek-V2.5
95.1%
#13Qwen2.5 14B Instruct
94.8%
#14Nova Lite
94.5%
#15Gemma 3 12B
94.4%
#16Qwen3 235B A22B
94.4%
HumanEval
Rank #36 of 62
#33Gemini 1.5 Pro
84.1%
#34Mistral Small 3 24B Instruct
84.8%
#35Qwen2.5 7B Instruct
84.8%
#36Qwen2.5 14B Instruct
83.5%
#37Phi 4
82.6%
#38IBM Granite 4.0 Tiny Preview
82.4%
#39Codestral-22B
81.1%
MBPP
Rank #8 of 31
#5Qwen2.5-Coder 7B Instruct
83.5%
#6Qwen2.5 VL 32B Instruct
84.0%
#7Qwen2.5 32B Instruct
84.0%
#8Qwen2.5 14B Instruct
82.0%
#9Qwen3 235B A22B
81.4%
#10Phi-3.5-MoE-instruct
80.8%
#11Qwen2 72B Instruct
80.2%
MMLU-Redux
Rank #9 of 13
#6Qwen2.5 32B Instruct
83.9%
#7Qwen2.5 72B Instruct
86.8%
#8Qwen3 235B A22B
87.4%
#9Qwen2.5 14B Instruct
80.0%
#10Qwen2.5-Coder 32B Instruct
77.5%
#11Qwen2.5 7B Instruct
75.4%
#12Qwen2.5-Omni-7B
71.0%
MATH
Rank #14 of 63
#11Phi 4
80.4%
#12Qwen2.5 VL 32B Instruct
82.2%
#13Qwen2.5 32B Instruct
83.1%
#14Qwen2.5 14B Instruct
80.0%
#15Claude 3.5 Sonnet
78.3%
#16Gemini 1.5 Flash
77.9%
#17Llama 3.3 70B Instruct
77.0%
All Benchmark Results for Qwen2.5 14B Instruct
Complete list of benchmark scores with detailed information
GSM8k GSM8k benchmark | math | text | 0.95 | 94.8% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.83 | 83.5% | Self-reported |
MBPP MBPP benchmark | code | text | 82.00 | 82.0% | Self-reported |
MMLU-Redux MMLU-Redux benchmark | general | text | 0.80 | 80.0% | Self-reported |
MATH MATH benchmark | math | text | 0.80 | 80.0% | Self-reported |
MMLU MMLU benchmark | general | text | 0.80 | 79.7% | Self-reported |
BBH BBH benchmark | general | text | 0.78 | 78.2% | Self-reported |
MMLU-STEM MMLU-STEM benchmark | general | text | 0.76 | 76.4% | Self-reported |
MultiPL-E MultiPL-E benchmark | general | text | 72.80 | 72.8% | Self-reported |
ARC-C ARC-C benchmark | reasoning | text | 0.67 | 67.3% | Self-reported |
Showing 1 to 10 of 16 benchmarks