
QwQ-32B
Zero-eval
by Alibaba
About
QwQ-32B is a language model developed by Alibaba. It achieves strong performance with an average score of 74.6% across 7 benchmarks. It excels particularly in MATH-500 (90.6%), IFEval (83.9%), AIME 2024 (79.5%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.
Timeline
AnnouncedMar 5, 2025
ReleasedMar 5, 2025
Knowledge CutoffNov 28, 2024
Specifications
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
7 benchmarks
Average Score
74.6%
Best Score
90.6%
High Performers (80%+)
2Top Categories
math
90.6%
code
73.6%
roleplay
73.1%
general
70.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MATH-500
Rank #15 of 22
#12DeepSeek R1 Distill Qwen 7B
92.8%
#13DeepSeek R1 Distill Qwen 14B
93.9%
#14DeepSeek-V3 0324
94.0%
#15QwQ-32B
90.6%
#16QwQ-32B-Preview
90.6%
#17DeepSeek-V3
90.2%
#18o1-mini
90.0%
IFEval
Rank #22 of 37
#19GPT-4.1 mini
84.1%
#20Qwen2.5 72B Instruct
84.1%
#21Phi 4 Reasoning Plus
84.9%
#22QwQ-32B
83.9%
#23Phi 4 Reasoning
83.4%
#24DeepSeek-R1
83.3%
#25Mistral Small 3 24B Instruct
82.9%
AIME 2024
Rank #24 of 41
#21DeepSeek-R1
79.8%
#22Claude 3.7 Sonnet
80.0%
#23DeepSeek R1 Distill Llama 8B
80.0%
#24QwQ-32B
79.5%
#25Kimi-k1.5
77.5%
#26Phi 4 Reasoning
75.3%
#27o1
74.3%
LiveBench
Rank #6 of 12
#3Qwen3 30B A3B
74.3%
#4Qwen3 32B
74.9%
#5Kimi K2 Instruct
76.4%
#6QwQ-32B
73.1%
#7o1
67.0%
#8o1-preview
52.3%
#9Qwen2.5 72B Instruct
52.3%
BFCL
Rank #9 of 10
#6Nova Lite
66.6%
#7Nova Pro
68.4%
#8Qwen3 30B A3B
69.1%
#9QwQ-32B
66.4%
#10Nova Micro
56.2%
All Benchmark Results for QwQ-32B
Complete list of benchmark scores with detailed information
MATH-500 MATH-500 benchmark | math | text | 0.91 | 90.6% | Self-reported |
IFEval IFEval benchmark | code | text | 0.84 | 83.9% | Self-reported |
AIME 2024 AIME 2024 benchmark | general | text | 0.80 | 79.5% | Self-reported |
LiveBench LiveBench benchmark | roleplay | text | 0.73 | 73.1% | Self-reported |
BFCL BFCL benchmark | general | text | 0.66 | 66.4% | Self-reported |
GPQA GPQA benchmark | general | text | 0.65 | 65.2% | Self-reported |
LiveCodeBench LiveCodeBench benchmark | code | text | 0.63 | 63.4% | Self-reported |