Anthropic

Claude Sonnet 4

Multimodal
Zero-eval
#1TAU-bench Airline
#2SWE-Bench Verified
#2Terminal-bench
+1 more

by Anthropic

About

Claude Sonnet 4 is a multimodal language model developed by Anthropic. It achieves strong performance with an average score of 69.4% across 8 benchmarks. It excels particularly in MMMLU (86.5%), TAU-bench Retail (80.5%), GPQA (75.4%). It supports a 328K token context window for handling large documents. The model is available through 3 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Anthropic's latest advancement in AI technology.

Pricing Range
Input (per 1M)$3.00 -$3.00
Output (per 1M)$15.00 -$15.00
Providers3
Timeline
AnnouncedMay 22, 2025
ReleasedMay 22, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

8 benchmarks
Average Score
69.4%
Best Score
86.5%
High Performers (80%+)
2

Performance Metrics

Max Context Window
328.0K
Avg Throughput
61.7 tok/s
Avg Latency
0ms

Top Categories

vision
74.4%
agents
70.3%
general
68.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

MMMLU

Rank #6 of 13
#3Qwen3 235B A22B
86.7%
#4GPT-4.1
87.3%
#5o1
87.7%
#6Claude Sonnet 4
86.5%
#7Claude 3.7 Sonnet
86.1%
#8GPT-4.5
85.1%
#9GPT-4o
81.4%

TAU-bench Retail

Rank #3 of 15
#1Claude 3.7 Sonnet
81.2%
#2Claude Opus 4
81.4%
#3Claude Sonnet 4
80.5%
#4o4-mini
71.8%
#5o1
70.8%
#6Claude 3.5 Sonnet
69.2%

GPQA

Rank #20 of 115
#17Llama 3.1 Nemotron Ultra 253B v1
76.0%
#18o3-mini
77.2%
#19Qwen3-235B-A22B-Instruct-2507
77.5%
#20Claude Sonnet 4
75.4%
#21Kimi K2 Instruct
75.1%
#22Gemini 2.0 Flash Thinking
74.2%
#23DeepSeek R1 Zero
73.3%

MMMU

Rank #13 of 52
#10GPT-4.1
74.8%
#11Claude 3.7 Sonnet
75.0%
#12GPT-4.5
75.2%
#13Claude Sonnet 4
74.4%
#14Llama 4 Maverick
73.4%
#15Gemini 2.5 Flash-Lite
72.9%
#16GPT-4.1 mini
72.7%

SWE-Bench Verified

Rank #2 of 28
#1GPT-5
74.9%
#2Claude Sonnet 4
72.7%
#3Claude Opus 4
72.5%
#4Claude 3.7 Sonnet
70.3%
#5o3
69.1%
All Benchmark Results for Claude Sonnet 4
Complete list of benchmark scores with detailed information
MMMLU
MMMLU benchmark
general
text
0.86
86.5%
Self-reported
TAU-bench Retail
TAU-bench Retail benchmark
agents
text
0.81
80.5%
Self-reported
GPQA
GPQA benchmark
general
text
0.75
75.4%
Self-reported
MMMU
MMMU benchmark
vision
multimodal
0.74
74.4%
Self-reported
SWE-Bench Verified
SWE-Bench Verified benchmark
general
text
0.73
72.7%
Self-reported
AIME 2025
AIME 2025 benchmark
general
text
0.70
70.5%
Self-reported
TAU-bench Airline
TAU-bench Airline benchmark
agents
text
0.60
60.0%
Self-reported
Terminal-bench
Terminal-bench benchmark
general
text
0.35
35.5%
Self-reported