Qwen2.5-Coder 32B Instruct

Name: Qwen2.5-Coder 32B Instruct
Price: 0.2 USD
Rating: 64.9 (15 reviews)
Author: Alibaba

Zero-eval

#1BigCodeBench-Full

#1BigCodeBench-Hard

#2MBPP

+1 more

by Alibaba

About

Qwen2.5-Coder 32B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 64.9% across 15 benchmarks. It excels particularly in HumanEval (92.7%), GSM8k (91.1%), MBPP (90.2%). The model shows particular specialization in reasoning tasks with an average performance of 78.1%. It supports a 256K token context window for handling large documents. The model is available through 4 API providers. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.09 -$0.89

Output (per 1M)$0.09 -$0.89

Providers4

Timeline

AnnouncedSep 19, 2024

ReleasedSep 19, 2024

Specifications

Training Tokens5.5T

License & Family

License

Apache 2.0

Base ModelQwen2.5 32B Instruct

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

15 benchmarks

Average Score

64.9%

Best Score

92.7%

High Performers (80%+)

Performance Metrics

Max Context Window

256.0K

Avg Throughput

74.0 tok/s

Avg Latency

0ms

Top Categories

reasoning

78.1%

math

74.2%

general

61.5%

code

58.2%

factuality

54.2%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

HumanEval

Rank #4 of 62

#1Kimi K2 Instruct

93.3%

#2GPT-5

93.4%

#3Claude 3.5 Sonnet

93.7%

#4Qwen2.5-Coder 32B Instruct

92.7%

#5o1-mini

92.4%

#6Claude 3.5 Sonnet

92.0%

#7Mistral Large 2

92.0%

GSM8k

Rank #24 of 46

#21Qwen2 72B Instruct

91.1%

#22Llama 3.1 Nemotron 70B Instruct

91.4%

#23Qwen2.5 7B Instruct

91.6%

#24Qwen2.5-Coder 32B Instruct

91.1%

#25Gemini 1.5 Pro

90.8%

#26Grok-1.5

90.0%

#27Gemma 3 4B

89.2%

MBPP

Rank #2 of 31

#1Llama-3.3 Nemotron Super 49B v1

91.3%

#2Qwen2.5-Coder 32B Instruct

90.2%

#3Qwen2.5 72B Instruct

88.2%

#4Llama 3.1 Nemotron Nano 8B V1

84.6%

#5Qwen2.5 32B Instruct

84.0%

HellaSwag

Rank #14 of 24

#11Mistral NeMo Instruct

83.5%

#12Phi-3.5-MoE-instruct

83.8%

#13Qwen2.5 32B Instruct

85.2%

#14Qwen2.5-Coder 32B Instruct

83.0%

#15Gemma 2 9B

81.9%

#16Granite 3.3 8B Base

80.1%

#17Gemma 3n E4B Instructed LiteRT Preview

78.6%

Winogrande

Rank #8 of 19

#5Phi-3.5-MoE-instruct

81.3%

#6Qwen2.5 32B Instruct

82.0%

#7Gemma 2 27B

83.7%

#8Qwen2.5-Coder 32B Instruct

80.8%

#9Gemma 2 9B

80.6%

#10Mistral NeMo Instruct

76.8%

#11Ministral 8B Instruct

75.3%

All Benchmark Results for Qwen2.5-Coder 32B Instruct

Complete list of benchmark scores with detailed information


HumanEval HumanEval benchmark	code	text	0.93	92.7%	Self-reported
GSM8k GSM8k benchmark	math	text	0.91	91.1%	Self-reported
MBPP MBPP benchmark	code	text	90.20	90.2%	Self-reported
HellaSwag HellaSwag benchmark	reasoning	text	0.83	83.0%	Self-reported
Winogrande Winogrande benchmark	reasoning	text	0.81	80.8%	Self-reported
MMLU-Redux MMLU-Redux benchmark	general	text	0.78	77.5%	Self-reported
MMLU MMLU benchmark	general	text	0.75	75.1%	Self-reported
ARC-C ARC-C benchmark	reasoning	text	0.70	70.5%	Self-reported
MATH MATH benchmark	math	text	0.57	57.2%	Self-reported
TruthfulQA TruthfulQA benchmark	factuality	text	0.54	54.2%	Self-reported

Showing 1 to 10 of 15 benchmarks

Resources

API Reference Research Paper Blog Post Repository Model Weights