Qwen2.5-Coder 7B Instruct

Name: Qwen2.5-Coder 7B Instruct
Rating: 58.0 (19 reviews)
Author: Alibaba

Zero-eval

#1MMLU-Base

#1CRUXEval-Input-CoT

#1CRUXEval-Output-CoT

+3 more

by Alibaba

About

Qwen2.5-Coder 7B Instruct is a language model developed by Alibaba. The model shows competitive results across 19 benchmarks. It excels particularly in HumanEval (88.4%), GSM8k (83.9%), MBPP (83.5%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Timeline

AnnouncedSep 19, 2024

ReleasedSep 19, 2024

Specifications

Training Tokens5.5T

License & Family

License

Apache 2.0

Base ModelQwen2.5 7B Instruct

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

19 benchmarks

Average Score

58.0%

Best Score

88.4%

High Performers (80%+)

Top Categories

reasoning

70.2%

math

65.3%

code

57.3%

general

52.3%

factuality

50.6%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

HumanEval

Rank #19 of 62

#16Grok-2

88.4%

#17Llama 3.3 70B Instruct

88.4%

#18Mistral Small 3.1 24B Instruct

88.4%

#19Qwen2.5-Coder 7B Instruct

88.4%

#20Qwen2.5 32B Instruct

88.4%

#21o1

88.1%

#22Claude 3.5 Haiku

88.1%

GSM8k

Rank #35 of 46

#32Gemini 1.5 Flash

86.2%

#33Phi-3.5-mini-instruct

86.2%

#34Jamba 1.5 Large

87.0%

#35Qwen2.5-Coder 7B Instruct

83.9%

#36Qwen2 7B Instruct

82.3%

#37Granite 3.3 8B Instruct

80.9%

#38Mistral Small 3 24B Base

80.7%

MBPP

Rank #7 of 31

#4Qwen2.5 VL 32B Instruct

84.0%

#5Qwen2.5 32B Instruct

84.0%

#6Llama 3.1 Nemotron Nano 8B V1

84.6%

#7Qwen2.5-Coder 7B Instruct

83.5%

#8Qwen2.5 14B Instruct

82.0%

#9Qwen3 235B A22B

81.4%

#10Phi-3.5-MoE-instruct

80.8%

HellaSwag

Rank #19 of 24

#16Gemma 3n E4B

78.6%

#17Gemma 3n E4B Instructed LiteRT Preview

78.6%

#18Granite 3.3 8B Base

80.1%

#19Qwen2.5-Coder 7B Instruct

76.8%

#20Gemma 3n E2B Instructed LiteRT (Preview)

72.2%

#21Gemma 3n E2B

72.2%

#22Llama 3.2 3B Instruct

69.8%

Winogrande

Rank #13 of 19

#10Granite 3.3 8B Base

74.4%

#11Ministral 8B Instruct

75.3%

#12Mistral NeMo Instruct

76.8%

#13Qwen2.5-Coder 7B Instruct

72.9%

#14Gemma 3n E4B

71.7%

#15Gemma 3n E4B Instructed LiteRT Preview

71.7%

#16Phi-3.5-mini-instruct

68.5%

All Benchmark Results for Qwen2.5-Coder 7B Instruct

Complete list of benchmark scores with detailed information


HumanEval HumanEval benchmark	code	text	0.88	88.4%	Self-reported
GSM8k GSM8k benchmark	math	text	0.84	83.9%	Self-reported
MBPP MBPP benchmark	code	text	83.50	83.5%	Self-reported
HellaSwag HellaSwag benchmark	reasoning	text	0.77	76.8%	Self-reported
Winogrande Winogrande benchmark	reasoning	text	0.73	72.9%	Self-reported
MMLU-Base MMLU-Base benchmark	general	text	0.68	68.0%	Self-reported
MMLU MMLU benchmark	general	text	0.68	67.6%	Self-reported
MMLU-Redux MMLU-Redux benchmark	general	text	0.67	66.6%	Self-reported
ARC-C ARC-C benchmark	reasoning	text	0.61	60.9%	Self-reported
CRUXEval-Input-CoT CRUXEval-Input-CoT benchmark	code	text	0.56	56.5%	Self-reported

Showing 1 to 10 of 19 benchmarks

Resources

API Reference Research Paper Blog Post Repository Model Weights