Alibaba

QwQ-32B

Zero-eval

by Alibaba

About

QwQ-32B is a language model developed by Alibaba. It achieves strong performance with an average score of 74.6% across 7 benchmarks. It excels particularly in MATH-500 (90.6%), IFEval (83.9%), AIME 2024 (79.5%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.

Timeline
AnnouncedMar 5, 2025
ReleasedMar 5, 2025
Knowledge CutoffNov 28, 2024
Specifications
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

7 benchmarks
Average Score
74.6%
Best Score
90.6%
High Performers (80%+)
2

Top Categories

math
90.6%
code
73.6%
roleplay
73.1%
general
70.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

MATH-500

Rank #15 of 22
#12DeepSeek R1 Distill Qwen 7B
92.8%
#13DeepSeek R1 Distill Qwen 14B
93.9%
#14DeepSeek-V3 0324
94.0%
#15QwQ-32B
90.6%
#16QwQ-32B-Preview
90.6%
#17DeepSeek-V3
90.2%
#18o1-mini
90.0%

IFEval

Rank #22 of 37
#19GPT-4.1 mini
84.1%
#20Qwen2.5 72B Instruct
84.1%
#21Phi 4 Reasoning Plus
84.9%
#22QwQ-32B
83.9%
#23Phi 4 Reasoning
83.4%
#24DeepSeek-R1
83.3%
#25Mistral Small 3 24B Instruct
82.9%

AIME 2024

Rank #24 of 41
#21DeepSeek-R1
79.8%
#22Claude 3.7 Sonnet
80.0%
#23DeepSeek R1 Distill Llama 8B
80.0%
#24QwQ-32B
79.5%
#25Kimi-k1.5
77.5%
#26Phi 4 Reasoning
75.3%
#27o1
74.3%

LiveBench

Rank #6 of 12
#3Qwen3 30B A3B
74.3%
#4Qwen3 32B
74.9%
#5Kimi K2 Instruct
76.4%
#6QwQ-32B
73.1%
#7o1
67.0%
#8o1-preview
52.3%
#9Qwen2.5 72B Instruct
52.3%

BFCL

Rank #9 of 10
#6Nova Lite
66.6%
#7Nova Pro
68.4%
#8Qwen3 30B A3B
69.1%
#9QwQ-32B
66.4%
#10Nova Micro
56.2%
All Benchmark Results for QwQ-32B
Complete list of benchmark scores with detailed information
MATH-500
MATH-500 benchmark
math
text
0.91
90.6%
Self-reported
IFEval
IFEval benchmark
code
text
0.84
83.9%
Self-reported
AIME 2024
AIME 2024 benchmark
general
text
0.80
79.5%
Self-reported
LiveBench
LiveBench benchmark
roleplay
text
0.73
73.1%
Self-reported
BFCL
BFCL benchmark
general
text
0.66
66.4%
Self-reported
GPQA
GPQA benchmark
general
text
0.65
65.2%
Self-reported
LiveCodeBench
LiveCodeBench benchmark
code
text
0.63
63.4%
Self-reported