Llama 3.3 70B Instruct

Name: Llama 3.3 70B Instruct
Price: 0.6 USD
Rating: 79.9 (9 reviews)
Author: Meta

Zero-eval

#1BFCL v2

#2MBPP EvalPlus

by Meta

About

Llama 3.3 70B Instruct is a language model developed by Meta. It achieves strong performance with an average score of 79.9% across 9 benchmarks. It excels particularly in IFEval (92.1%), MGSM (91.1%), HumanEval (88.4%). The model shows particular specialization in code tasks with an average performance of 89.4%. It supports a 256K token context window for handling large documents. The model is available through 9 API providers. Released in 2024, it represents Meta's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.20 -$0.89

Output (per 1M)$0.20 -$7.90

Providers9

Timeline

AnnouncedDec 6, 2024

ReleasedDec 6, 2024

Specifications

Training Tokens15.0T

License & Family

License

Llama 3.3 Community License Agreement

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

9 benchmarks

Average Score

79.9%

Best Score

92.1%

High Performers (80%+)

Performance Metrics

Max Context Window

256.0K

Avg Throughput

451.9 tok/s

Avg Latency

1ms

Top Categories

code

89.4%

math

84.0%

general

70.7%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

IFEval

Rank #4 of 37

#1Nova Pro

92.1%

#2Claude 3.7 Sonnet

93.2%

#3o3-mini

93.9%

#4Llama 3.3 70B Instruct

92.1%

#5Gemma 3 27B

90.4%

#6Gemma 3 4B

90.2%

#7Kimi K2 Instruct

89.8%

MGSM

Rank #5 of 31

#2Claude 3.5 Sonnet

91.6%

#3Claude 3.5 Sonnet

91.6%

#4o3-mini

92.0%

#5Llama 3.3 70B Instruct

91.1%

#6o1-preview

90.8%

#7Claude 3 Opus

90.7%

#8Llama 4 Scout

90.6%

HumanEval

Rank #17 of 62

#14Mistral Small 3.1 24B Instruct

88.4%

#15DeepSeek-V2.5

89.0%

#16Nova Pro

89.0%

#17Llama 3.3 70B Instruct

88.4%

#18Grok-2

88.4%

#19Qwen2.5-Coder 7B Instruct

88.4%

#20Qwen2.5 32B Instruct

88.4%

MBPP EvalPlus

Rank #2 of 2

#1Llama 3.1 405B Instruct

88.6%

#2Llama 3.3 70B Instruct

87.6%

MMLU

Rank #24 of 78

#21Llama 3.2 90B Instruct

86.0%

#22Grok-2 mini

86.2%

#23GPT-4

86.4%

#24Llama 3.3 70B Instruct

86.0%

#25Gemini 1.5 Pro

85.9%

#26Nova Pro

85.9%

#27GPT-4o

85.7%

All Benchmark Results for Llama 3.3 70B Instruct

Complete list of benchmark scores with detailed information


IFEval IFEval benchmark	code	text	0.92	92.1%	Self-reported
MGSM MGSM benchmark	math	text	0.91	91.1%	Self-reported
HumanEval HumanEval benchmark	code	text	0.88	88.4%	Self-reported
MBPP EvalPlus MBPP EvalPlus benchmark	code	text	0.88	87.6%	Self-reported
MMLU MMLU benchmark	general	text	0.86	86.0%	Self-reported
BFCL v2 BFCL v2 benchmark	general	text	0.77	77.3%	Self-reported
MATH MATH benchmark	math	text	0.77	77.0%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.69	68.9%	Self-reported
GPQA GPQA benchmark	general	text	0.51	50.5%	Self-reported

Resources

API Reference Playground Repository Model Weights