
Qwen2.5-Coder 7B Instruct
Zero-eval
#1MMLU-Base
#1CRUXEval-Input-CoT
#1CRUXEval-Output-CoT
+3 more
by Alibaba
About
Qwen2.5-Coder 7B Instruct is a language model developed by Alibaba. The model shows competitive results across 19 benchmarks. It excels particularly in HumanEval (88.4%), GSM8k (83.9%), MBPP (83.5%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.
Timeline
AnnouncedSep 19, 2024
ReleasedSep 19, 2024
Specifications
Training Tokens5.5T
License & Family
License
Apache 2.0
Base ModelQwen2.5 7B Instruct
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
19 benchmarks
Average Score
58.0%
Best Score
88.4%
High Performers (80%+)
3Top Categories
reasoning
70.2%
math
65.3%
code
57.3%
general
52.3%
factuality
50.6%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
HumanEval
Rank #19 of 62
#16Grok-2
88.4%
#17Llama 3.3 70B Instruct
88.4%
#18Mistral Small 3.1 24B Instruct
88.4%
#19Qwen2.5-Coder 7B Instruct
88.4%
#20Qwen2.5 32B Instruct
88.4%
#21o1
88.1%
#22Claude 3.5 Haiku
88.1%
GSM8k
Rank #35 of 46
#32Gemini 1.5 Flash
86.2%
#33Phi-3.5-mini-instruct
86.2%
#34Jamba 1.5 Large
87.0%
#35Qwen2.5-Coder 7B Instruct
83.9%
#36Qwen2 7B Instruct
82.3%
#37Granite 3.3 8B Instruct
80.9%
#38Mistral Small 3 24B Base
80.7%
MBPP
Rank #7 of 31
#4Qwen2.5 VL 32B Instruct
84.0%
#5Qwen2.5 32B Instruct
84.0%
#6Llama 3.1 Nemotron Nano 8B V1
84.6%
#7Qwen2.5-Coder 7B Instruct
83.5%
#8Qwen2.5 14B Instruct
82.0%
#9Qwen3 235B A22B
81.4%
#10Phi-3.5-MoE-instruct
80.8%
HellaSwag
Rank #19 of 24
#16Gemma 3n E4B
78.6%
#17Gemma 3n E4B Instructed LiteRT Preview
78.6%
#18Granite 3.3 8B Base
80.1%
#19Qwen2.5-Coder 7B Instruct
76.8%
#20Gemma 3n E2B Instructed LiteRT (Preview)
72.2%
#21Gemma 3n E2B
72.2%
#22Llama 3.2 3B Instruct
69.8%
Winogrande
Rank #13 of 19
#10Granite 3.3 8B Base
74.4%
#11Ministral 8B Instruct
75.3%
#12Mistral NeMo Instruct
76.8%
#13Qwen2.5-Coder 7B Instruct
72.9%
#14Gemma 3n E4B
71.7%
#15Gemma 3n E4B Instructed LiteRT Preview
71.7%
#16Phi-3.5-mini-instruct
68.5%
All Benchmark Results for Qwen2.5-Coder 7B Instruct
Complete list of benchmark scores with detailed information
HumanEval HumanEval benchmark | code | text | 0.88 | 88.4% | Self-reported |
GSM8k GSM8k benchmark | math | text | 0.84 | 83.9% | Self-reported |
MBPP MBPP benchmark | code | text | 83.50 | 83.5% | Self-reported |
HellaSwag HellaSwag benchmark | reasoning | text | 0.77 | 76.8% | Self-reported |
Winogrande Winogrande benchmark | reasoning | text | 0.73 | 72.9% | Self-reported |
MMLU-Base MMLU-Base benchmark | general | text | 0.68 | 68.0% | Self-reported |
MMLU MMLU benchmark | general | text | 0.68 | 67.6% | Self-reported |
MMLU-Redux MMLU-Redux benchmark | general | text | 0.67 | 66.6% | Self-reported |
ARC-C ARC-C benchmark | reasoning | text | 0.61 | 60.9% | Self-reported |
CRUXEval-Input-CoT CRUXEval-Input-CoT benchmark | code | text | 0.56 | 56.5% | Self-reported |
Showing 1 to 10 of 19 benchmarks