Claude 3.7 Sonnet

Name: Claude 3.7 Sonnet
Price: 3 USD
Rating: 74.1 (11 reviews)
Author: Anthropic

Multimodal

Zero-eval

#2IFEval

#2TAU-bench Retail

#3TAU-bench Airline

+1 more

by Anthropic

About

Claude 3.7 Sonnet is a multimodal language model developed by Anthropic. It achieves strong performance with an average score of 74.1% across 11 benchmarks. It excels particularly in MATH-500 (96.2%), IFEval (93.2%), MMMLU (86.1%). It supports a 328K token context window for handling large documents. The model is available through 4 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Anthropic's latest advancement in AI technology.

Pricing Range

Input (per 1M)$3.00 -$3.00

Output (per 1M)$15.00 -$15.00

Providers4

Timeline

AnnouncedFeb 24, 2025

ReleasedFeb 24, 2025

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

11 benchmarks

Average Score

74.1%

Best Score

96.2%

High Performers (80%+)

Performance Metrics

Max Context Window

328.0K

Avg Throughput

56.8 tok/s

Avg Latency

0ms

Top Categories

math

96.2%

code

93.2%

vision

75.0%

agents

69.8%

general

68.5%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MATH-500

Rank #5 of 22

#2Llama-3.3 Nemotron Super 49B v1

96.6%

#3Llama 3.1 Nemotron Ultra 253B v1

97.0%

#4DeepSeek-R1

97.3%

#5Claude 3.7 Sonnet

96.2%

#6Kimi-k1.5

96.2%

#7DeepSeek R1 Zero

95.9%

#8Llama 3.1 Nemotron Nano 8B V1

95.4%

IFEval

Rank #2 of 37

#1o3-mini

93.9%

#2Claude 3.7 Sonnet

93.2%

#3Nova Pro

92.1%

#4Llama 3.3 70B Instruct

92.1%

#5Gemma 3 27B

90.4%

MMMLU

Rank #7 of 13

#4Claude Sonnet 4

86.5%

#5Qwen3 235B A22B

86.7%

#6GPT-4.1

87.3%

#7Claude 3.7 Sonnet

86.1%

#8GPT-4.5

85.1%

#9GPT-4o

81.4%

#10GPT-4.1 mini

78.5%

GPQA

Rank #5 of 115

#2GPT-5

85.7%

#3Gemini 2.5 Pro Preview 06-05

86.4%

#4Grok-4

87.5%

#5Claude 3.7 Sonnet

84.8%

#6Grok-3

84.6%

#7Grok-3 Mini

84.0%

#8o3

83.3%

TAU-bench Retail

Rank #2 of 15

#1Claude Opus 4

81.4%

#2Claude 3.7 Sonnet

81.2%

#3Claude Sonnet 4

80.5%

#4o4-mini

71.8%

#5o1

70.8%

All Benchmark Results for Claude 3.7 Sonnet

Complete list of benchmark scores with detailed information


MATH-500 MATH-500 benchmark	math	text	0.96	96.2%	Self-reported
IFEval IFEval benchmark	code	text	0.93	93.2%	Self-reported
MMMLU MMMLU benchmark	general	text	0.86	86.1%	Self-reported
GPQA GPQA benchmark	general	text	0.85	84.8%	Self-reported
TAU-bench Retail TAU-bench Retail benchmark	agents	text	0.81	81.2%	Self-reported
AIME 2024 AIME 2024 benchmark	general	text	0.80	80.0%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.75	75.0%	Self-reported
SWE-Bench Verified SWE-Bench Verified benchmark	general	text	0.70	70.3%	Self-reported
TAU-bench Airline TAU-bench Airline benchmark	agents	text	0.58	58.4%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.55	54.8%	Self-reported

Showing 1 to 10 of 11 benchmarks

Resources

API Reference Playground Blog Post