Qwen2.5 14B Instruct

Name: Qwen2.5 14B Instruct
Rating: 70.0 (16 reviews)
Author: Alibaba

Zero-eval

#2MMLU-STEM

#2MBPP+

by Alibaba

About

Qwen2.5 14B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 70.0% across 16 benchmarks. It excels particularly in GSM8k (94.8%), HumanEval (83.5%), MBPP (82.0%). The model shows particular specialization in math tasks with an average performance of 87.4%. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Timeline

AnnouncedSep 19, 2024

ReleasedSep 19, 2024

Specifications

Training Tokens18.0T

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

16 benchmarks

Average Score

70.0%

Best Score

94.8%

High Performers (80%+)

Top Categories

math

87.4%

code

70.0%

general

67.4%

reasoning

67.3%

factuality

58.4%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #13 of 46

#10Nova Pro

94.8%

#11Claude 3 Opus

95.0%

#12DeepSeek-V2.5

95.1%

#13Qwen2.5 14B Instruct

94.8%

#14Nova Lite

94.5%

#15Gemma 3 12B

94.4%

#16Qwen3 235B A22B

94.4%

HumanEval

Rank #36 of 62

#33Gemini 1.5 Pro

84.1%

#34Mistral Small 3 24B Instruct

84.8%

#35Qwen2.5 7B Instruct

84.8%

#36Qwen2.5 14B Instruct

83.5%

#37Phi 4

82.6%

#38IBM Granite 4.0 Tiny Preview

82.4%

#39Codestral-22B

81.1%

MBPP

Rank #8 of 31

#5Qwen2.5-Coder 7B Instruct

83.5%

#6Qwen2.5 VL 32B Instruct

84.0%

#7Qwen2.5 32B Instruct

84.0%

#8Qwen2.5 14B Instruct

82.0%

#9Qwen3 235B A22B

81.4%

#10Phi-3.5-MoE-instruct

80.8%

#11Qwen2 72B Instruct

80.2%

MMLU-Redux

Rank #9 of 13

#6Qwen2.5 32B Instruct

83.9%

#7Qwen2.5 72B Instruct

86.8%

#8Qwen3 235B A22B

87.4%

#9Qwen2.5 14B Instruct

80.0%

#10Qwen2.5-Coder 32B Instruct

77.5%

#11Qwen2.5 7B Instruct

75.4%

#12Qwen2.5-Omni-7B

71.0%

MATH

Rank #14 of 63

#11Phi 4

80.4%

#12Qwen2.5 VL 32B Instruct

82.2%

#13Qwen2.5 32B Instruct

83.1%

#14Qwen2.5 14B Instruct

80.0%

#15Claude 3.5 Sonnet

78.3%

#16Gemini 1.5 Flash

77.9%

#17Llama 3.3 70B Instruct

77.0%

All Benchmark Results for Qwen2.5 14B Instruct

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.95	94.8%	Self-reported
HumanEval HumanEval benchmark	code	text	0.83	83.5%	Self-reported
MBPP MBPP benchmark	code	text	82.00	82.0%	Self-reported
MMLU-Redux MMLU-Redux benchmark	general	text	0.80	80.0%	Self-reported
MATH MATH benchmark	math	text	0.80	80.0%	Self-reported
MMLU MMLU benchmark	general	text	0.80	79.7%	Self-reported
BBH BBH benchmark	general	text	0.78	78.2%	Self-reported
MMLU-STEM MMLU-STEM benchmark	general	text	0.76	76.4%	Self-reported
MultiPL-E MultiPL-E benchmark	general	text	72.80	72.8%	Self-reported
ARC-C ARC-C benchmark	reasoning	text	0.67	67.3%	Self-reported

Showing 1 to 10 of 16 benchmarks

Resources

API Reference Research Paper Blog Post Repository Model Weights