
Claude Sonnet 4
Multimodal
Zero-eval
#1TAU-bench Airline
#2SWE-Bench Verified
#2Terminal-bench
+1 more
by Anthropic
About
Claude Sonnet 4 is a multimodal language model developed by Anthropic. It achieves strong performance with an average score of 69.4% across 8 benchmarks. It excels particularly in MMMLU (86.5%), TAU-bench Retail (80.5%), GPQA (75.4%). It supports a 328K token context window for handling large documents. The model is available through 3 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Anthropic's latest advancement in AI technology.
Pricing Range
Input (per 1M)$3.00 -$3.00
Output (per 1M)$15.00 -$15.00
Providers3
Timeline
AnnouncedMay 22, 2025
ReleasedMay 22, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
8 benchmarks
Average Score
69.4%
Best Score
86.5%
High Performers (80%+)
2Performance Metrics
Max Context Window
328.0KAvg Throughput
61.7 tok/sAvg Latency
0msTop Categories
vision
74.4%
agents
70.3%
general
68.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MMMLU
Rank #6 of 13
#3Qwen3 235B A22B
86.7%
#4GPT-4.1
87.3%
#5o1
87.7%
#6Claude Sonnet 4
86.5%
#7Claude 3.7 Sonnet
86.1%
#8GPT-4.5
85.1%
#9GPT-4o
81.4%
TAU-bench Retail
Rank #3 of 15
#1Claude 3.7 Sonnet
81.2%
#2Claude Opus 4
81.4%
#3Claude Sonnet 4
80.5%
#4o4-mini
71.8%
#5o1
70.8%
#6Claude 3.5 Sonnet
69.2%
GPQA
Rank #20 of 115
#17Llama 3.1 Nemotron Ultra 253B v1
76.0%
#18o3-mini
77.2%
#19Qwen3-235B-A22B-Instruct-2507
77.5%
#20Claude Sonnet 4
75.4%
#21Kimi K2 Instruct
75.1%
#22Gemini 2.0 Flash Thinking
74.2%
#23DeepSeek R1 Zero
73.3%
MMMU
Rank #13 of 52
#10GPT-4.1
74.8%
#11Claude 3.7 Sonnet
75.0%
#12GPT-4.5
75.2%
#13Claude Sonnet 4
74.4%
#14Llama 4 Maverick
73.4%
#15Gemini 2.5 Flash-Lite
72.9%
#16GPT-4.1 mini
72.7%
SWE-Bench Verified
Rank #2 of 28
#1GPT-5
74.9%
#2Claude Sonnet 4
72.7%
#3Claude Opus 4
72.5%
#4Claude 3.7 Sonnet
70.3%
#5o3
69.1%
All Benchmark Results for Claude Sonnet 4
Complete list of benchmark scores with detailed information
MMMLU MMMLU benchmark | general | text | 0.86 | 86.5% | Self-reported |
TAU-bench Retail TAU-bench Retail benchmark | agents | text | 0.81 | 80.5% | Self-reported |
GPQA GPQA benchmark | general | text | 0.75 | 75.4% | Self-reported |
MMMU MMMU benchmark | vision | multimodal | 0.74 | 74.4% | Self-reported |
SWE-Bench Verified SWE-Bench Verified benchmark | general | text | 0.73 | 72.7% | Self-reported |
AIME 2025 AIME 2025 benchmark | general | text | 0.70 | 70.5% | Self-reported |
TAU-bench Airline TAU-bench Airline benchmark | agents | text | 0.60 | 60.0% | Self-reported |
Terminal-bench Terminal-bench benchmark | general | text | 0.35 | 35.5% | Self-reported |
Resources