Alibaba

Qwen2.5 14B Instruct

Zero-eval
#2MMLU-STEM
#2MBPP+

by Alibaba

About

Qwen2.5 14B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 70.0% across 16 benchmarks. It excels particularly in GSM8k (94.8%), HumanEval (83.5%), MBPP (82.0%). The model shows particular specialization in math tasks with an average performance of 87.4%. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Timeline
AnnouncedSep 19, 2024
ReleasedSep 19, 2024
Specifications
Training Tokens18.0T
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

16 benchmarks
Average Score
70.0%
Best Score
94.8%
High Performers (80%+)
5

Top Categories

math
87.4%
code
70.0%
general
67.4%
reasoning
67.3%
factuality
58.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

GSM8k

Rank #13 of 46
#10Nova Pro
94.8%
#11Claude 3 Opus
95.0%
#12DeepSeek-V2.5
95.1%
#13Qwen2.5 14B Instruct
94.8%
#14Nova Lite
94.5%
#15Gemma 3 12B
94.4%
#16Qwen3 235B A22B
94.4%

HumanEval

Rank #36 of 62
#33Gemini 1.5 Pro
84.1%
#34Mistral Small 3 24B Instruct
84.8%
#35Qwen2.5 7B Instruct
84.8%
#36Qwen2.5 14B Instruct
83.5%
#37Phi 4
82.6%
#38IBM Granite 4.0 Tiny Preview
82.4%
#39Codestral-22B
81.1%

MBPP

Rank #8 of 31
#5Qwen2.5-Coder 7B Instruct
83.5%
#6Qwen2.5 VL 32B Instruct
84.0%
#7Qwen2.5 32B Instruct
84.0%
#8Qwen2.5 14B Instruct
82.0%
#9Qwen3 235B A22B
81.4%
#10Phi-3.5-MoE-instruct
80.8%
#11Qwen2 72B Instruct
80.2%

MMLU-Redux

Rank #9 of 13
#6Qwen2.5 32B Instruct
83.9%
#7Qwen2.5 72B Instruct
86.8%
#8Qwen3 235B A22B
87.4%
#9Qwen2.5 14B Instruct
80.0%
#10Qwen2.5-Coder 32B Instruct
77.5%
#11Qwen2.5 7B Instruct
75.4%
#12Qwen2.5-Omni-7B
71.0%

MATH

Rank #14 of 63
#11Phi 4
80.4%
#12Qwen2.5 VL 32B Instruct
82.2%
#13Qwen2.5 32B Instruct
83.1%
#14Qwen2.5 14B Instruct
80.0%
#15Claude 3.5 Sonnet
78.3%
#16Gemini 1.5 Flash
77.9%
#17Llama 3.3 70B Instruct
77.0%
All Benchmark Results for Qwen2.5 14B Instruct
Complete list of benchmark scores with detailed information
GSM8k
GSM8k benchmark
math
text
0.95
94.8%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.83
83.5%
Self-reported
MBPP
MBPP benchmark
code
text
82.00
82.0%
Self-reported
MMLU-Redux
MMLU-Redux benchmark
general
text
0.80
80.0%
Self-reported
MATH
MATH benchmark
math
text
0.80
80.0%
Self-reported
MMLU
MMLU benchmark
general
text
0.80
79.7%
Self-reported
BBH
BBH benchmark
general
text
0.78
78.2%
Self-reported
MMLU-STEM
MMLU-STEM benchmark
general
text
0.76
76.4%
Self-reported
MultiPL-E
MultiPL-E benchmark
general
text
72.80
72.8%
Self-reported
ARC-C
ARC-C benchmark
reasoning
text
0.67
67.3%
Self-reported
Showing 1 to 10 of 16 benchmarks