Claude 3.5 Sonnet

Name: Claude 3.5 Sonnet
Price: 3 USD
Rating: 84.1 (9 reviews)
Author: Anthropic

Multimodal

Zero-eval

#1BIG-Bench Hard

#3MGSM

#3DROP

by Anthropic

About

Claude 3.5 Sonnet is a multimodal language model developed by Anthropic. This model demonstrates exceptional performance with an average score of 84.1% across 9 benchmarks. It excels particularly in GSM8k (96.4%), BIG-Bench Hard (93.1%), HumanEval (92.0%). It supports a 400K token context window for handling large documents. The model is available through 2 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Anthropic's latest advancement in AI technology.

Pricing Range

Input (per 1M)$3.00 -$3.00

Output (per 1M)$15.00 -$15.00

Providers2

Timeline

AnnouncedJun 21, 2024

ReleasedJun 21, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

9 benchmarks

Average Score

84.1%

Best Score

96.4%

High Performers (80%+)

Performance Metrics

Max Context Window

400.0K

Avg Throughput

71.5 tok/s

Avg Latency

0ms

Top Categories

code

92.0%

math

86.4%

general

81.2%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #6 of 46

#3Claude 3.5 Sonnet

96.4%

#4Llama 3.1 405B Instruct

96.8%

#5GPT-4.5

97.0%

#6Claude 3.5 Sonnet

96.4%

#7Gemma 3 27B

95.9%

#8Qwen2.5 32B Instruct

95.9%

#9Qwen2.5 72B Instruct

95.8%

BIG-Bench Hard

Rank #1 of 21

#1Claude 3.5 Sonnet

93.1%

#2Claude 3.5 Sonnet

93.1%

#3Gemini 1.5 Pro

89.2%

#4Gemma 3 27B

87.6%

HumanEval

Rank #6 of 62

#3o1-mini

92.4%

#4Qwen2.5-Coder 32B Instruct

92.7%

#5Kimi K2 Instruct

93.3%

#6Claude 3.5 Sonnet

92.0%

#7Mistral Large 2

92.0%

#8Qwen2.5 VL 32B Instruct

91.5%

#9GPT-4o

90.2%

MGSM

Rank #3 of 31

#1o3-mini

92.0%

#2Llama 4 Maverick

92.3%

#3Claude 3.5 Sonnet

91.6%

#4Claude 3.5 Sonnet

91.6%

#5Llama 3.3 70B Instruct

91.1%

#6o1-preview

90.8%

MMLU

Rank #6 of 78

#3DeepSeek-R1

90.8%

#4o1-preview

90.8%

#5GPT-4.5

90.8%

#6Claude 3.5 Sonnet

90.4%

#7Claude 3.5 Sonnet

90.4%

#8GPT-4.1

90.2%

#9Kimi K2 Instruct

89.5%

All Benchmark Results for Claude 3.5 Sonnet

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.96	96.4%	Self-reported
BIG-Bench Hard BIG-Bench Hard benchmark	general	text	0.93	93.1%	Self-reported
HumanEval HumanEval benchmark	code	text	0.92	92.0%	Self-reported
MGSM MGSM benchmark	math	text	0.92	91.6%	Self-reported
MMLU MMLU benchmark	general	text	0.90	90.4%	Self-reported
DROP DROP benchmark	general	text	0.87	87.1%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.76	76.1%	Self-reported
MATH MATH benchmark	math	text	0.71	71.1%	Self-reported
GPQA GPQA benchmark	general	text	0.59	59.4%	Self-reported

Resources

API Reference Playground Blog Post