Llama 3.1 405B Instruct

Name: Llama 3.1 405B Instruct
Price: 3.5 USD
Rating: 79.2 (18 reviews)
Author: Meta

Zero-eval

#1ARC-C

#1API-Bank

#1Multilingual MGSM (CoT)

+7 more

by Meta

About

Llama 3.1 405B Instruct is a language model developed by Meta. It achieves strong performance with an average score of 79.2% across 18 benchmarks. It excels particularly in ARC-C (96.9%), GSM8k (96.8%), API-Bank (92.0%). It supports a 256K token context window for handling large documents. The model is available through 8 API providers. Released in 2024, it represents Meta's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.89 -$9.50

Output (per 1M)$0.89 -$16.00

Providers8

Timeline

AnnouncedJul 23, 2024

ReleasedJul 23, 2024

Specifications

Training Tokens15.0T

License & Family

License

Llama 3.1 Community License

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

18 benchmarks

Average Score

79.2%

Best Score

96.9%

High Performers (80%+)

Performance Metrics

Max Context Window

256.0K

Avg Throughput

48.3 tok/s

Avg Latency

0ms

Top Categories

reasoning

96.9%

math

87.4%

code

81.4%

general

73.2%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

ARC-C

Rank #1 of 31

#1Llama 3.1 405B Instruct

96.9%

#2Claude 3 Opus

96.4%

#3Nova Pro

94.8%

#4Llama 3.1 70B Instruct

94.8%

GSM8k

Rank #4 of 46

#1GPT-4.5

97.0%

#2o1

97.1%

#3Kimi K2 Instruct

97.3%

#4Llama 3.1 405B Instruct

96.8%

#5Claude 3.5 Sonnet

96.4%

#6Claude 3.5 Sonnet

96.4%

#7Gemma 3 27B

95.9%

API-Bank

Rank #1 of 3

#1Llama 3.1 405B Instruct

92.0%

#2Llama 3.1 70B Instruct

90.0%

#3Llama 3.1 8B Instruct

82.6%

Multilingual MGSM (CoT)

Rank #1 of 3

#1Llama 3.1 405B Instruct

91.6%

#2Llama 3.1 70B Instruct

86.9%

#3Llama 3.1 8B Instruct

68.9%

HumanEval

Rank #13 of 62

#10Gemini Diffusion

89.6%

#11Granite 3.3 8B Base

89.7%

#12Granite 3.3 8B Instruct

89.7%

#13Llama 3.1 405B Instruct

89.0%

#14Nova Pro

89.0%

#15DeepSeek-V2.5

89.0%

#16Mistral Small 3.1 24B Instruct

88.4%

All Benchmark Results for Llama 3.1 405B Instruct

Complete list of benchmark scores with detailed information


ARC-C ARC-C benchmark	reasoning	text	0.97	96.9%	Self-reported
GSM8k GSM8k benchmark	math	text	0.97	96.8%	Self-reported
API-Bank API-Bank benchmark	general	text	0.92	92.0%	Self-reported
Multilingual MGSM (CoT) Multilingual MGSM (CoT) benchmark	math	text	0.92	91.6%	Self-reported
HumanEval HumanEval benchmark	code	text	0.89	89.0%	Self-reported
MMLU (CoT) MMLU (CoT) benchmark	general	text	0.89	88.6%	Self-reported
IFEval IFEval benchmark	code	text	0.89	88.6%	Self-reported
MBPP EvalPlus MBPP EvalPlus benchmark	code	text	0.89	88.6%	Self-reported
BFCL BFCL benchmark	general	text	0.89	88.5%	Self-reported
MMLU MMLU benchmark	general	text	0.87	87.3%	Self-reported

Showing 1 to 10 of 18 benchmarks

Resources

API Reference Playground Blog Post Repository Model Weights