Claude Sonnet 4

Name: Claude Sonnet 4
Price: 3 USD
Rating: 69.4 (8 reviews)
Author: Anthropic

Multimodal

Zero-eval

#1TAU-bench Airline

#2SWE-Bench Verified

#2Terminal-bench

+1 more

by Anthropic

About

Claude Sonnet 4 is a multimodal language model developed by Anthropic. It achieves strong performance with an average score of 69.4% across 8 benchmarks. It excels particularly in MMMLU (86.5%), TAU-bench Retail (80.5%), GPQA (75.4%). It supports a 328K token context window for handling large documents. The model is available through 3 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Anthropic's latest advancement in AI technology.

Pricing Range

Input (per 1M)$3.00 -$3.00

Output (per 1M)$15.00 -$15.00

Providers3

Timeline

AnnouncedMay 22, 2025

ReleasedMay 22, 2025

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

8 benchmarks

Average Score

69.4%

Best Score

86.5%

High Performers (80%+)

Performance Metrics

Max Context Window

328.0K

Avg Throughput

61.7 tok/s

Avg Latency

0ms

Top Categories

vision

74.4%

agents

70.3%

general

68.1%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MMMLU

Rank #6 of 13

#3Qwen3 235B A22B

86.7%

#4GPT-4.1

87.3%

#5o1

87.7%

#6Claude Sonnet 4

86.5%

#7Claude 3.7 Sonnet

86.1%

#8GPT-4.5

85.1%

#9GPT-4o

81.4%

TAU-bench Retail

Rank #3 of 15

#1Claude 3.7 Sonnet

81.2%

#2Claude Opus 4

81.4%

#3Claude Sonnet 4

80.5%

#4o4-mini

71.8%

#5o1

70.8%

#6Claude 3.5 Sonnet

69.2%

GPQA

Rank #20 of 115

#17Llama 3.1 Nemotron Ultra 253B v1

76.0%

#18o3-mini

77.2%

#19Qwen3-235B-A22B-Instruct-2507

77.5%

#20Claude Sonnet 4

75.4%

#21Kimi K2 Instruct

75.1%

#22Gemini 2.0 Flash Thinking

74.2%

#23DeepSeek R1 Zero

73.3%

MMMU

Rank #13 of 52

#10GPT-4.1

74.8%

#11Claude 3.7 Sonnet

75.0%

#12GPT-4.5

75.2%

#13Claude Sonnet 4

74.4%

#14Llama 4 Maverick

73.4%

#15Gemini 2.5 Flash-Lite

72.9%

#16GPT-4.1 mini

72.7%

SWE-Bench Verified

Rank #2 of 28

#1GPT-5

74.9%

#2Claude Sonnet 4

72.7%

#3Claude Opus 4

72.5%

#4Claude 3.7 Sonnet

70.3%

#5o3

69.1%

All Benchmark Results for Claude Sonnet 4

Complete list of benchmark scores with detailed information


MMMLU MMMLU benchmark	general	text	0.86	86.5%	Self-reported
TAU-bench Retail TAU-bench Retail benchmark	agents	text	0.81	80.5%	Self-reported
GPQA GPQA benchmark	general	text	0.75	75.4%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.74	74.4%	Self-reported
SWE-Bench Verified SWE-Bench Verified benchmark	general	text	0.73	72.7%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.70	70.5%	Self-reported
TAU-bench Airline TAU-bench Airline benchmark	agents	text	0.60	60.0%	Self-reported
Terminal-bench Terminal-bench benchmark	general	text	0.35	35.5%	Self-reported

Resources

API Reference Playground Blog Post