Claude Opus 4

Name: Claude Opus 4
Price: 15 USD
Rating: 64.6 (9 reviews)
Author: Anthropic

Multimodal

Zero-eval

#1TAU-bench Retail

#1MMMU (validation)

#1Terminal-bench

+4 more

by Anthropic

About

Claude Opus 4 is a multimodal language model developed by Anthropic. It achieves strong performance with an average score of 64.6% across 9 benchmarks. It excels particularly in MMMLU (88.8%), TAU-bench Retail (81.4%), GPQA (79.6%). It supports a 328K token context window for handling large documents. The model is available through 3 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Anthropic's latest advancement in AI technology.

Pricing Range

Input (per 1M)$15.00 -$15.00

Output (per 1M)$75.00 -$75.00

Providers3

Timeline

AnnouncedMay 22, 2025

ReleasedMay 22, 2025

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

9 benchmarks

Average Score

64.6%

Best Score

88.8%

High Performers (80%+)

Performance Metrics

Max Context Window

328.0K

Avg Throughput

87.3 tok/s

Avg Latency

0ms

Top Categories

vision

76.5%

general

71.1%

agents

70.5%

reasoning

8.6%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MMMLU

Rank #2 of 13

#1Claude Opus 4.1

98.4%

#2Claude Opus 4

88.8%

#3o1

87.7%

#4GPT-4.1

87.3%

#5Qwen3 235B A22B

86.7%

TAU-bench Retail

Rank #1 of 15

#1Claude Opus 4

81.4%

#2Claude 3.7 Sonnet

81.2%

#3Claude Sonnet 4

80.5%

#4o4-mini

71.8%

GPQA

Rank #14 of 115

#11DeepSeek-R1-0528

81.0%

#12o4-mini

81.4%

#13GPT-5 mini

82.3%

#14Claude Opus 4

79.6%

#15o1-pro

79.0%

#16o1

78.0%

#17Qwen3-235B-A22B-Instruct-2507

77.5%

MMMU (validation)

Rank #1 of 2

#1Claude Opus 4

76.5%

#2Claude Opus 4.1

64.8%

AIME 2025

Rank #16 of 36

#13Phi 4 Reasoning Plus

78.0%

#14Claude Opus 4.1

80.2%

#15Qwen3 235B A22B

81.5%

#16Claude Opus 4

75.5%

#17Qwen3 32B

72.9%

#18Llama 3.1 Nemotron Ultra 253B v1

72.5%

#19Gemini 2.5 Flash

72.0%

All Benchmark Results for Claude Opus 4

Complete list of benchmark scores with detailed information


MMMLU MMMLU benchmark	general	text	0.89	88.8%	Self-reported
TAU-bench Retail TAU-bench Retail benchmark	agents	text	0.81	81.4%	Self-reported
GPQA GPQA benchmark	general	text	0.80	79.6%	Self-reported
MMMU (validation) MMMU (validation) benchmark	vision	multimodal	0.77	76.5%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.76	75.5%	Self-reported
SWE-Bench Verified SWE-Bench Verified benchmark	general	text	0.72	72.5%	Self-reported
TAU-bench Airline TAU-bench Airline benchmark	agents	text	0.60	59.6%	Self-reported
Terminal-bench Terminal-bench benchmark	general	text	0.39	39.2%	Self-reported
ARC-AGI v2 ARC-AGI v2 benchmark	reasoning	text	0.09	8.6%	Unverified

Resources

API Reference Playground Blog Post