Qwen2 7B Instruct

Name: Qwen2 7B Instruct
Rating: 59.5 (14 reviews)
Author: Alibaba

Zero-eval

by Alibaba

About

Qwen2 7B Instruct is a language model developed by Alibaba. The model shows competitive results across 14 benchmarks. It excels particularly in MT-Bench (84.1%), GSM8k (82.3%), HumanEval (79.9%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Timeline

AnnouncedJul 23, 2024

ReleasedJul 23, 2024

Specifications

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

14 benchmarks

Average Score

59.5%

Best Score

84.1%

High Performers (80%+)

Top Categories

roleplay

84.1%

math

66.0%

code

64.2%

general

49.4%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MT-Bench

Rank #6 of 11

#3Mistral Large 2

86.3%

#4Qwen2.5 7B Instruct

87.5%

#5DeepSeek-V2.5

90.2%

#6Qwen2 7B Instruct

84.1%

#7Mistral Small 3 24B Instruct

83.5%

#8Ministral 8B Instruct

83.0%

#9Llama 3.1 Nemotron Nano 8B V1

81.0%

GSM8k

Rank #36 of 46

#33Qwen2.5-Coder 7B Instruct

83.9%

#34Gemini 1.5 Flash

86.2%

#35Phi-3.5-mini-instruct

86.2%

#36Qwen2 7B Instruct

82.3%

#37Granite 3.3 8B Instruct

80.9%

#38Mistral Small 3 24B Base

80.7%

#39Llama 3.2 3B Instruct

77.7%

HumanEval

Rank #42 of 62

#39Llama 3.1 70B Instruct

80.5%

#40Nova Micro

81.1%

#41Codestral-22B

81.1%

#42Qwen2 7B Instruct

79.9%

#43Qwen2.5-Omni-7B

78.7%

#44Claude 3 Haiku

75.9%

#45Gemma 3n E4B Instructed

75.0%

C-Eval

Rank #6 of 6

#3Qwen2 72B Instruct

83.8%

#4DeepSeek-V3

86.5%

#5Kimi-k1.5

88.3%

#6Qwen2 7B Instruct

77.2%

AlignBench

Rank #4 of 4

#1Qwen2.5 7B Instruct

73.3%

#2DeepSeek-V2.5

80.4%

#3Qwen2.5 72B Instruct

81.6%

#4Qwen2 7B Instruct

72.1%

All Benchmark Results for Qwen2 7B Instruct

Complete list of benchmark scores with detailed information


MT-Bench MT-Bench benchmark	roleplay	text	84.10	84.1%	Self-reported
GSM8k GSM8k benchmark	math	text	0.82	82.3%	Self-reported
HumanEval HumanEval benchmark	code	text	0.80	79.9%	Self-reported
C-Eval C-Eval benchmark	code	text	0.77	77.2%	Self-reported
AlignBench AlignBench benchmark	general	text	0.72	72.1%	Self-reported
MMLU MMLU benchmark	general	text	0.70	70.5%	Self-reported
EvalPlus EvalPlus benchmark	code	text	70.30	70.3%	Self-reported
MBPP MBPP benchmark	code	text	67.20	67.2%	Self-reported
MultiPL-E MultiPL-E benchmark	general	text	59.10	59.1%	Self-reported
MATH MATH benchmark	math	text	0.50	49.6%	Self-reported

Showing 1 to 10 of 14 benchmarks

Resources

API Reference Playground Research Paper Repository Model Weights