QwQ-32B

Name: QwQ-32B
Rating: 74.6 (7 reviews)
Author: Alibaba

Zero-eval

by Alibaba

About

QwQ-32B is a language model developed by Alibaba. It achieves strong performance with an average score of 74.6% across 7 benchmarks. It excels particularly in MATH-500 (90.6%), IFEval (83.9%), AIME 2024 (79.5%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.

Timeline

AnnouncedMar 5, 2025

ReleasedMar 5, 2025

Knowledge CutoffNov 28, 2024

Specifications

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

7 benchmarks

Average Score

74.6%

Best Score

90.6%

High Performers (80%+)

Top Categories

math

90.6%

code

73.6%

roleplay

73.1%

general

70.4%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MATH-500

Rank #15 of 22

#12DeepSeek R1 Distill Qwen 7B

92.8%

#13DeepSeek R1 Distill Qwen 14B

93.9%

#14DeepSeek-V3 0324

94.0%

#15QwQ-32B

90.6%

#16QwQ-32B-Preview

90.6%

#17DeepSeek-V3

90.2%

#18o1-mini

90.0%

IFEval

Rank #22 of 37

#19GPT-4.1 mini

84.1%

#20Qwen2.5 72B Instruct

84.1%

#21Phi 4 Reasoning Plus

84.9%

#22QwQ-32B

83.9%

#23Phi 4 Reasoning

83.4%

#24DeepSeek-R1

83.3%

#25Mistral Small 3 24B Instruct

82.9%

AIME 2024

Rank #24 of 41

#21DeepSeek-R1

79.8%

#22Claude 3.7 Sonnet

80.0%

#23DeepSeek R1 Distill Llama 8B

80.0%

#24QwQ-32B

79.5%

#25Kimi-k1.5

77.5%

#26Phi 4 Reasoning

75.3%

#27o1

74.3%

LiveBench

Rank #6 of 12

#3Qwen3 30B A3B

74.3%

#4Qwen3 32B

74.9%

#5Kimi K2 Instruct

76.4%

#6QwQ-32B

73.1%

#7o1

67.0%

#8o1-preview

52.3%

#9Qwen2.5 72B Instruct

52.3%

BFCL

Rank #9 of 10

#6Nova Lite

66.6%

#7Nova Pro

68.4%

#8Qwen3 30B A3B

69.1%

#9QwQ-32B

66.4%

#10Nova Micro

56.2%

All Benchmark Results for QwQ-32B

Complete list of benchmark scores with detailed information


MATH-500 MATH-500 benchmark	math	text	0.91	90.6%	Self-reported
IFEval IFEval benchmark	code	text	0.84	83.9%	Self-reported
AIME 2024 AIME 2024 benchmark	general	text	0.80	79.5%	Self-reported
LiveBench LiveBench benchmark	roleplay	text	0.73	73.1%	Self-reported
BFCL BFCL benchmark	general	text	0.66	66.4%	Self-reported
GPQA GPQA benchmark	general	text	0.65	65.2%	Self-reported
LiveCodeBench LiveCodeBench benchmark	code	text	0.63	63.4%	Self-reported

Resources

API Reference Playground Research Paper Blog Post Repository Model Weights