
Qwen2 7B Instruct
Zero-eval
by Alibaba
About
Qwen2 7B Instruct is a language model developed by Alibaba. The model shows competitive results across 14 benchmarks. It excels particularly in MT-Bench (84.1%), GSM8k (82.3%), HumanEval (79.9%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.
Timeline
AnnouncedJul 23, 2024
ReleasedJul 23, 2024
Specifications
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
14 benchmarks
Average Score
59.5%
Best Score
84.1%
High Performers (80%+)
2Top Categories
roleplay
84.1%
math
66.0%
code
64.2%
general
49.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MT-Bench
Rank #6 of 11
#3Mistral Large 2
86.3%
#4Qwen2.5 7B Instruct
87.5%
#5DeepSeek-V2.5
90.2%
#6Qwen2 7B Instruct
84.1%
#7Mistral Small 3 24B Instruct
83.5%
#8Ministral 8B Instruct
83.0%
#9Llama 3.1 Nemotron Nano 8B V1
81.0%
GSM8k
Rank #36 of 46
#33Qwen2.5-Coder 7B Instruct
83.9%
#34Gemini 1.5 Flash
86.2%
#35Phi-3.5-mini-instruct
86.2%
#36Qwen2 7B Instruct
82.3%
#37Granite 3.3 8B Instruct
80.9%
#38Mistral Small 3 24B Base
80.7%
#39Llama 3.2 3B Instruct
77.7%
HumanEval
Rank #42 of 62
#39Llama 3.1 70B Instruct
80.5%
#40Nova Micro
81.1%
#41Codestral-22B
81.1%
#42Qwen2 7B Instruct
79.9%
#43Qwen2.5-Omni-7B
78.7%
#44Claude 3 Haiku
75.9%
#45Gemma 3n E4B Instructed
75.0%
C-Eval
Rank #6 of 6
#3Qwen2 72B Instruct
83.8%
#4DeepSeek-V3
86.5%
#5Kimi-k1.5
88.3%
#6Qwen2 7B Instruct
77.2%
AlignBench
Rank #4 of 4
#1Qwen2.5 7B Instruct
73.3%
#2DeepSeek-V2.5
80.4%
#3Qwen2.5 72B Instruct
81.6%
#4Qwen2 7B Instruct
72.1%
All Benchmark Results for Qwen2 7B Instruct
Complete list of benchmark scores with detailed information
MT-Bench MT-Bench benchmark | roleplay | text | 84.10 | 84.1% | Self-reported |
GSM8k GSM8k benchmark | math | text | 0.82 | 82.3% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.80 | 79.9% | Self-reported |
C-Eval C-Eval benchmark | code | text | 0.77 | 77.2% | Self-reported |
AlignBench AlignBench benchmark | general | text | 0.72 | 72.1% | Self-reported |
MMLU MMLU benchmark | general | text | 0.70 | 70.5% | Self-reported |
EvalPlus EvalPlus benchmark | code | text | 70.30 | 70.3% | Self-reported |
MBPP MBPP benchmark | code | text | 67.20 | 67.2% | Self-reported |
MultiPL-E MultiPL-E benchmark | general | text | 59.10 | 59.1% | Self-reported |
MATH MATH benchmark | math | text | 0.50 | 49.6% | Self-reported |
Showing 1 to 10 of 14 benchmarks