Claude 3.5 Sonnet

Name: Claude 3.5 Sonnet
Price: 3 USD
Rating: 73.3 (19 reviews)
Author: Anthropic

Multimodal

Zero-eval

#1AI2D

#1HumanEval

#1ChartQA

+3 more

by Anthropic

About

Claude 3.5 Sonnet is a multimodal language model developed by Anthropic. It achieves strong performance with an average score of 73.3% across 19 benchmarks. It excels particularly in GSM8k (96.4%), DocVQA (95.2%), AI2D (94.7%). It supports a 400K token context window for handling large documents. The model is available through 3 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Anthropic's latest advancement in AI technology.

Pricing Range

Input (per 1M)$3.00 -$3.00

Output (per 1M)$15.00 -$15.00

Providers3

Timeline

AnnouncedOct 22, 2024

ReleasedOct 22, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

19 benchmarks

Average Score

73.3%

Best Score

96.4%

High Performers (80%+)

Performance Metrics

Max Context Window

400.0K

Avg Throughput

81.0 tok/s

Avg Latency

0ms

Top Categories

code

93.7%

math

83.5%

vision

81.8%

general

68.7%

agents

57.6%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #5 of 46

#2Llama 3.1 405B Instruct

96.8%

#3GPT-4.5

97.0%

#4o1

97.1%

#5Claude 3.5 Sonnet

96.4%

#6Claude 3.5 Sonnet

96.4%

#7Gemma 3 27B

95.9%

#8Qwen2.5 32B Instruct

95.9%

DocVQA

Rank #4 of 26

#1Qwen2.5-Omni-7B

95.2%

#2Qwen2.5 VL 7B Instruct

95.7%

#3Qwen2.5 VL 72B Instruct

96.4%

#4Claude 3.5 Sonnet

95.2%

#5Mistral Small 3.2 24B Instruct

94.9%

#6Qwen2.5 VL 32B Instruct

94.8%

#7Llama 4 Maverick

94.4%

AI2D

Rank #1 of 17

#1Claude 3.5 Sonnet

94.7%

#2GPT-4o

94.2%

#3Pixtral Large

93.8%

#4Mistral Small 3.2 24B Instruct

92.9%

HumanEval

Rank #1 of 62

#1Claude 3.5 Sonnet

93.7%

#2GPT-5

93.4%

#3Kimi K2 Instruct

93.3%

#4Qwen2.5-Coder 32B Instruct

92.7%

BIG-Bench Hard

Rank #2 of 21

#1Claude 3.5 Sonnet

93.1%

#2Claude 3.5 Sonnet

93.1%

#3Gemini 1.5 Pro

89.2%

#4Gemma 3 27B

87.6%

#5Claude 3 Opus

86.8%

All Benchmark Results for Claude 3.5 Sonnet

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.96	96.4%	Self-reported
DocVQA DocVQA benchmark	vision	multimodal	0.95	95.2%	Self-reported
AI2D AI2D benchmark	general	text	0.95	94.7%	Self-reported
HumanEval HumanEval benchmark	code	text	0.94	93.7%	Self-reported
BIG-Bench Hard BIG-Bench Hard benchmark	general	text	0.93	93.1%	Self-reported
MGSM MGSM benchmark	math	text	0.92	91.6%	Self-reported
ChartQA ChartQA benchmark	general	multimodal	0.91	90.8%	Self-reported
MMLU MMLU benchmark	general	text	0.90	90.4%	Self-reported
DROP DROP benchmark	general	text	0.87	87.1%	Self-reported
MATH MATH benchmark	math	text	0.78	78.3%	Self-reported

Showing 1 to 10 of 19 benchmarks

Resources

API Reference Playground Research Paper Blog Post