Claude 3 Opus

Name: Claude 3 Opus
Price: 15 USD
Rating: 81.6 (11 reviews)
Author: Anthropic

Multimodal

Zero-eval

#1HellaSwag

#2ARC-C

by Anthropic

About

Claude 3 Opus is a multimodal language model developed by Anthropic. This model demonstrates exceptional performance with an average score of 81.6% across 11 benchmarks. It excels particularly in ARC-C (96.4%), HellaSwag (95.4%), GSM8k (95.0%). The model shows particular specialization in reasoning tasks with an average performance of 95.9%. It supports a 400K token context window for handling large documents. The model is available through 3 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Anthropic's latest advancement in AI technology.

Pricing Range

Input (per 1M)$15.00 -$15.00

Output (per 1M)$75.00 -$75.00

Providers3

Timeline

AnnouncedFeb 29, 2024

ReleasedFeb 29, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

11 benchmarks

Average Score

81.6%

Best Score

96.4%

High Performers (80%+)

Performance Metrics

Max Context Window

400.0K

Avg Throughput

87.3 tok/s

Avg Latency

0ms

Top Categories

reasoning

95.9%

code

84.9%

math

81.9%

general

75.1%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

ARC-C

Rank #2 of 31

#1Llama 3.1 405B Instruct

96.9%

#2Claude 3 Opus

96.4%

#3Nova Pro

94.8%

#4Llama 3.1 70B Instruct

94.8%

#5Claude 3 Sonnet

93.2%

HellaSwag

Rank #1 of 24

#1Claude 3 Opus

95.4%

#2GPT-4

95.3%

#3Gemini 1.5 Pro

93.3%

#4Claude 3 Sonnet

89.0%

GSM8k

Rank #11 of 46

#8DeepSeek-V2.5

95.1%

#9Qwen2.5 72B Instruct

95.8%

#10Qwen2.5 32B Instruct

95.9%

#11Claude 3 Opus

95.0%

#12Nova Pro

94.8%

#13Qwen2.5 14B Instruct

94.8%

#14Nova Lite

94.5%

MGSM

Rank #7 of 31

#4o1-preview

90.8%

#5Llama 3.3 70B Instruct

91.1%

#6Claude 3.5 Sonnet

91.6%

#7Claude 3 Opus

90.7%

#8Llama 4 Scout

90.6%

#9GPT-4o

90.5%

#10o1

89.3%

MMLU

Rank #19 of 78

#16o3-mini

86.9%

#17Llama 3.1 405B Instruct

87.3%

#18Kimi-k1.5

87.4%

#19Claude 3 Opus

86.8%

#20GPT-4 Turbo

86.5%

#21GPT-4

86.4%

#22Grok-2 mini

86.2%

All Benchmark Results for Claude 3 Opus

Complete list of benchmark scores with detailed information


ARC-C ARC-C benchmark	reasoning	text	0.96	96.4%	Self-reported
HellaSwag HellaSwag benchmark	reasoning	text	0.95	95.4%	Self-reported
GSM8k GSM8k benchmark	math	text	0.95	95.0%	Self-reported
MGSM MGSM benchmark	math	text	0.91	90.7%	Self-reported
MMLU MMLU benchmark	general	text	0.87	86.8%	Self-reported
BIG-Bench Hard BIG-Bench Hard benchmark	general	text	0.87	86.8%	Self-reported
HumanEval HumanEval benchmark	code	text	0.85	84.9%	Self-reported
DROP DROP benchmark	general	text	0.83	83.1%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.69	68.5%	Self-reported
MATH MATH benchmark	math	text	0.60	60.1%	Self-reported

Showing 1 to 10 of 11 benchmarks

Resources

API Reference Playground Research Paper Blog Post