Qwen2.5 7B Instruct

Name: Qwen2.5 7B Instruct
Price: 0.3 USD
Rating: 65.6 (14 reviews)
Author: Alibaba

Zero-eval

#3AlignBench

by Alibaba

About

Qwen2.5 7B Instruct is a language model developed by Alibaba. It achieves strong performance with an average score of 65.6% across 14 benchmarks. It excels particularly in GSM8k (91.6%), MT-Bench (87.5%), HumanEval (84.8%). The model shows particular specialization in math tasks with an average performance of 83.5%. It supports a 139K token context window for handling large documents. The model is available through 1 API provider. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Alibaba's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.30 -$0.30

Output (per 1M)$0.30 -$0.30

Providers1

Timeline

AnnouncedSep 19, 2024

ReleasedSep 19, 2024

Specifications

Training Tokens18.0T

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

14 benchmarks

Average Score

65.6%

Best Score

91.6%

High Performers (80%+)

Performance Metrics

Max Context Window

139.3K

Avg Throughput

138.0 tok/s

Avg Latency

1ms

Top Categories

math

83.5%

code

66.0%

roleplay

61.7%

general

60.6%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #21 of 46

#18Kimi K2 Base

92.1%

#19Nova Micro

92.3%

#20Claude 3 Sonnet

92.3%

#21Qwen2.5 7B Instruct

91.6%

#22Llama 3.1 Nemotron 70B Instruct

91.4%

#23Qwen2 72B Instruct

91.1%

#24Qwen2.5-Coder 32B Instruct

91.1%

MT-Bench

Rank #4 of 11

#1DeepSeek-V2.5

90.2%

#2Llama-3.3 Nemotron Super 49B v1

91.7%

#3Qwen2.5 72B Instruct

93.5%

#4Qwen2.5 7B Instruct

87.5%

#5Mistral Large 2

86.3%

#6Qwen2 7B Instruct

84.1%

#7Mistral Small 3 24B Instruct

83.5%

HumanEval

Rank #33 of 62

#30Claude 3 Opus

84.9%

#31Gemma 3 12B

85.4%

#32Nova Lite

85.4%

#33Qwen2.5 7B Instruct

84.8%

#34Mistral Small 3 24B Instruct

84.8%

#35Gemini 1.5 Pro

84.1%

#36Qwen2.5 14B Instruct

83.5%

MBPP

Rank #12 of 31

#9Qwen2 72B Instruct

80.2%

#10Phi-3.5-MoE-instruct

80.8%

#11Qwen3 235B A22B

81.4%

#12Qwen2.5 7B Instruct

79.2%

#13Codestral-22B

78.2%

#14Llama 4 Maverick

77.6%

#15Gemini Diffusion

76.0%

MATH

Rank #22 of 63

#19Gemma 3 4B

75.6%

#20Grok-2

76.1%

#21GPT-4o

76.6%

#22Qwen2.5 7B Instruct

75.5%

#23DeepSeek-V2.5

74.7%

#24Llama 3.1 405B Instruct

73.8%

#25Nova Lite

73.3%

All Benchmark Results for Qwen2.5 7B Instruct

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.92	91.6%	Self-reported
MT-Bench MT-Bench benchmark	roleplay	text	87.50	87.5%	Self-reported
HumanEval HumanEval benchmark	code	text	0.85	84.8%	Self-reported
MBPP MBPP benchmark	code	text	79.20	79.2%	Self-reported
MATH MATH benchmark	math	text	0.76	75.5%	Self-reported
MMLU-Redux MMLU-Redux benchmark	general	text	0.75	75.4%	Self-reported
AlignBench AlignBench benchmark	general	text	0.73	73.3%	Self-reported
IFEval IFEval benchmark	code	text	0.71	71.2%	Self-reported
MultiPL-E MultiPL-E benchmark	general	text	70.40	70.4%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.56	56.3%	Self-reported

Showing 1 to 10 of 14 benchmarks

Resources

API Reference Research Paper Blog Post Repository Model Weights