Qwen2.5 32B Instruct

Name: Qwen2.5 32B Instruct
Rating: 74.3 (18 reviews)
Author: Alibaba

Zero-eval

#1MMLU-STEM

#1MBPP+

#2TheoremQA

+2 more

by Alibaba

About

Qwen2.5 32B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 74.3% across 18 benchmarks. It excels particularly in GSM8k (95.9%), HumanEval (88.4%), HellaSwag (85.2%). The model shows particular specialization in math tasks with an average performance of 89.5%. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Timeline

AnnouncedSep 19, 2024

ReleasedSep 19, 2024

Specifications

Training Tokens18.0T

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

18 benchmarks

Average Score

74.3%

Best Score

95.9%

High Performers (80%+)

Top Categories

math

89.5%

reasoning

79.2%

code

73.0%

general

71.3%

factuality

57.8%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #8 of 46

#5Gemma 3 27B

95.9%

#6Claude 3.5 Sonnet

96.4%

#7Claude 3.5 Sonnet

96.4%

#8Qwen2.5 32B Instruct

95.9%

#9Qwen2.5 72B Instruct

95.8%

#10DeepSeek-V2.5

95.1%

#11Claude 3 Opus

95.0%

HumanEval

Rank #20 of 62

#17Qwen2.5-Coder 7B Instruct

88.4%

#18Grok-2

88.4%

#19Llama 3.3 70B Instruct

88.4%

#20Qwen2.5 32B Instruct

88.4%

#21o1

88.1%

#22Claude 3.5 Haiku

88.1%

#23GPT-4.5

88.0%

HellaSwag

Rank #11 of 24

#8Llama 3.1 Nemotron 70B Instruct

85.6%

#9Claude 3 Haiku

85.9%

#10Gemma 2 27B

86.4%

#11Qwen2.5 32B Instruct

85.2%

#12Phi-3.5-MoE-instruct

83.8%

#13Mistral NeMo Instruct

83.5%

#14Qwen2.5-Coder 32B Instruct

83.0%

BBH

Rank #3 of 8

#1Nova Pro

86.9%

#2Qwen3 235B A22B

88.9%

#3Qwen2.5 32B Instruct

84.5%

#4DeepSeek-V2.5

84.3%

#5Nova Lite

82.4%

#6Qwen2 72B Instruct

82.4%

MBPP

Rank #5 of 31

#2Llama 3.1 Nemotron Nano 8B V1

84.6%

#3Qwen2.5 72B Instruct

88.2%

#4Qwen2.5-Coder 32B Instruct

90.2%

#5Qwen2.5 32B Instruct

84.0%

#6Qwen2.5 VL 32B Instruct

84.0%

#7Qwen2.5-Coder 7B Instruct

83.5%

#8Qwen2.5 14B Instruct

82.0%

All Benchmark Results for Qwen2.5 32B Instruct

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.96	95.9%	Self-reported
HumanEval HumanEval benchmark	code	text	0.88	88.4%	Self-reported
HellaSwag HellaSwag benchmark	reasoning	text	0.85	85.2%	Self-reported
BBH BBH benchmark	general	text	0.84	84.5%	Self-reported
MBPP MBPP benchmark	code	text	84.00	84.0%	Self-reported
MMLU-Redux MMLU-Redux benchmark	general	text	0.84	83.9%	Self-reported
MMLU MMLU benchmark	general	text	0.83	83.3%	Self-reported
MATH MATH benchmark	math	text	0.83	83.1%	Self-reported
Winogrande Winogrande benchmark	reasoning	text	0.82	82.0%	Self-reported
MMLU-STEM MMLU-STEM benchmark	general	text	0.81	80.9%	Self-reported

Showing 1 to 10 of 18 benchmarks

Resources

API Reference Blog Post Repository Model Weights