Qwen3 235B A22B

Name: Qwen3 235B A22B
Price: 0.2 USD
Rating: 76.2 (23 reviews)
Author: Alibaba

Zero-eval

#1Arena Hard

#1BBH

#1CRUX-O

+5 more

by Alibaba

About

Qwen3 235B A22B is a language model developed by Alibaba. It achieves strong performance with an average score of 76.2% across 23 benchmarks. It excels particularly in Arena Hard (95.6%), GSM8k (94.4%), BBH (88.9%). The model shows particular specialization in math tasks with an average performance of 83.3%. It supports a 256K token context window for handling large documents. The model is available through 4 API providers. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.10 -$0.20

Output (per 1M)$0.10 -$0.80

Providers4

Timeline

AnnouncedApr 29, 2025

ReleasedApr 29, 2025

Specifications

Training Tokens36.0T

License & Family

License

Apache 2.0

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

23 benchmarks

Average Score

76.2%

Best Score

95.6%

High Performers (80%+)

Performance Metrics

Max Context Window

256.0K

Avg Throughput

38.0 tok/s

Avg Latency

1ms

Top Categories

math

83.3%

roleplay

77.1%

code

76.6%

general

74.8%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

Arena Hard

Rank #1 of 22

#1Qwen3 235B A22B

95.6%

#2Qwen3 32B

93.8%

#3DeepSeek-R1

92.3%

#4Qwen3 30B A3B

91.0%

GSM8k

Rank #16 of 46

#13Gemma 3 12B

94.4%

#14Nova Lite

94.5%

#15Qwen2.5 14B Instruct

94.8%

#16Qwen3 235B A22B

94.4%

#17Mistral Large 2

93.0%

#18Claude 3 Sonnet

92.3%

#19Nova Micro

92.3%

BBH

Rank #1 of 8

#1Qwen3 235B A22B

88.9%

#2Nova Pro

86.9%

#3Qwen2.5 32B Instruct

84.5%

#4DeepSeek-V2.5

84.3%

MMLU

Rank #12 of 78

#9DeepSeek-V3

88.5%

#10GPT-4o

88.7%

#11Kimi K2 Instruct

89.5%

#12Qwen3 235B A22B

87.8%

#13Kimi K2 Base

87.8%

#14GPT-4.1 mini

87.5%

#15Grok-2

87.5%

MMLU-Redux

Rank #6 of 13

#3DeepSeek-V3

89.1%

#4Kimi K2 Instruct

92.7%

#5DeepSeek-R1

92.9%

#6Qwen3 235B A22B

87.4%

#7Qwen2.5 72B Instruct

86.8%

#8Qwen2.5 32B Instruct

83.9%

#9Qwen2.5 14B Instruct

80.0%

All Benchmark Results for Qwen3 235B A22B

Complete list of benchmark scores with detailed information


Arena Hard Arena Hard benchmark	general	text	0.96	95.6%	Self-reported
GSM8k GSM8k benchmark	math	text	0.94	94.4%	Self-reported
BBH BBH benchmark	general	text	0.89	88.9%	Self-reported
MMLU MMLU benchmark	general	text	0.88	87.8%	Self-reported
MMLU-Redux MMLU-Redux benchmark	general	text	0.87	87.4%	Self-reported
MMMLU MMMLU benchmark	general	text	0.87	86.7%	Self-reported
AIME 2024 AIME 2024 benchmark	general	text	0.86	85.7%	Self-reported
MGSM MGSM benchmark	math	text	0.84	83.5%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.81	81.5%	Self-reported
MBPP MBPP benchmark	code	text	81.40	81.4%	Self-reported

Showing 1 to 10 of 23 benchmarks

Resources

API Reference Playground Repository Model Weights