Anthropic

Claude 3 Haiku

Multimodal
Zero-eval

by Anthropic

About

Claude 3 Haiku is a multimodal language model developed by Anthropic. It achieves strong performance with an average score of 71.5% across 10 benchmarks. It excels particularly in ARC-C (89.2%), GSM8k (88.9%), HellaSwag (85.9%). The model shows particular specialization in reasoning tasks with an average performance of 87.5%. It supports a 400K token context window for handling large documents. The model is available through 3 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Anthropic's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.25 -$0.25
Output (per 1M)$1.25 -$1.25
Providers3
Timeline
AnnouncedMar 13, 2024
ReleasedMar 13, 2024
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

10 benchmarks
Average Score
71.5%
Best Score
89.2%
High Performers (80%+)
3

Performance Metrics

Max Context Window
400.0K
Avg Throughput
82.0 tok/s
Avg Latency
0ms

Top Categories

reasoning
87.5%
code
75.9%
math
67.6%
general
65.2%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

ARC-C

Rank #11 of 31
#8Nova Micro
90.2%
#9Phi-3.5-MoE-instruct
91.0%
#10Mistral Small 3 24B Base
91.3%
#11Claude 3 Haiku
89.2%
#12Jamba 1.5 Mini
85.7%
#13Phi-3.5-mini-instruct
84.6%
#14Phi 4 Mini
83.7%

GSM8k

Rank #28 of 46
#25Gemma 3 4B
89.2%
#26Grok-1.5
90.0%
#27Gemini 1.5 Pro
90.8%
#28Claude 3 Haiku
88.9%
#29Qwen2.5-Omni-7B
88.7%
#30Phi-3.5-MoE-instruct
88.7%
#31Phi 4 Mini
88.6%

HellaSwag

Rank #9 of 24
#6Gemma 2 27B
86.4%
#7Gemini 1.5 Flash
86.5%
#8Qwen2 72B Instruct
87.6%
#9Claude 3 Haiku
85.9%
#10Llama 3.1 Nemotron 70B Instruct
85.6%
#11Qwen2.5 32B Instruct
85.2%
#12Phi-3.5-MoE-instruct
83.8%

DROP

Rank #17 of 28
#14Claude 3 Sonnet
78.9%
#15Nova Micro
79.3%
#16Llama 3.1 70B Instruct
79.6%
#17Claude 3 Haiku
78.4%
#18Phi 4
75.5%
#19Gemini 1.5 Pro
74.9%
#20GPT-3.5 Turbo
70.2%

HumanEval

Rank #44 of 62
#41Qwen2.5-Omni-7B
78.7%
#42Qwen2 7B Instruct
79.9%
#43Llama 3.1 70B Instruct
80.5%
#44Claude 3 Haiku
75.9%
#45Gemma 3n E4B Instructed
75.0%
#46Gemma 3n E4B Instructed LiteRT Preview
75.0%
#47Gemini 1.5 Flash
74.3%
All Benchmark Results for Claude 3 Haiku
Complete list of benchmark scores with detailed information
ARC-C
ARC-C benchmark
reasoning
text
0.89
89.2%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.89
88.9%
Self-reported
HellaSwag
HellaSwag benchmark
reasoning
text
0.86
85.9%
Self-reported
DROP
DROP benchmark
general
text
0.78
78.4%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.76
75.9%
Self-reported
MMLU
MMLU benchmark
general
text
0.75
75.2%
Self-reported
MGSM
MGSM benchmark
math
text
0.75
75.1%
Self-reported
BIG-Bench Hard
BIG-Bench Hard benchmark
general
text
0.74
73.7%
Self-reported
MATH
MATH benchmark
math
text
0.39
38.9%
Self-reported
GPQA
GPQA benchmark
general
text
0.33
33.3%
Self-reported