Meta

Llama 3.3 70B Instruct

Zero-eval
#1BFCL v2
#2MBPP EvalPlus

by Meta

About

Llama 3.3 70B Instruct is a language model developed by Meta. It achieves strong performance with an average score of 79.9% across 9 benchmarks. It excels particularly in IFEval (92.1%), MGSM (91.1%), HumanEval (88.4%). The model shows particular specialization in code tasks with an average performance of 89.4%. It supports a 256K token context window for handling large documents. The model is available through 9 API providers. Released in 2024, it represents Meta's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.20 -$0.89
Output (per 1M)$0.20 -$7.90
Providers9
Timeline
AnnouncedDec 6, 2024
ReleasedDec 6, 2024
Specifications
Training Tokens15.0T
License & Family
License
Llama 3.3 Community License Agreement
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

9 benchmarks
Average Score
79.9%
Best Score
92.1%
High Performers (80%+)
5

Performance Metrics

Max Context Window
256.0K
Avg Throughput
451.9 tok/s
Avg Latency
1ms

Top Categories

code
89.4%
math
84.0%
general
70.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

IFEval

Rank #4 of 37
#1Nova Pro
92.1%
#2Claude 3.7 Sonnet
93.2%
#3o3-mini
93.9%
#4Llama 3.3 70B Instruct
92.1%
#5Gemma 3 27B
90.4%
#6Gemma 3 4B
90.2%
#7Kimi K2 Instruct
89.8%

MGSM

Rank #5 of 31
#2Claude 3.5 Sonnet
91.6%
#3Claude 3.5 Sonnet
91.6%
#4o3-mini
92.0%
#5Llama 3.3 70B Instruct
91.1%
#6o1-preview
90.8%
#7Claude 3 Opus
90.7%
#8Llama 4 Scout
90.6%

HumanEval

Rank #17 of 62
#14Mistral Small 3.1 24B Instruct
88.4%
#15DeepSeek-V2.5
89.0%
#16Nova Pro
89.0%
#17Llama 3.3 70B Instruct
88.4%
#18Grok-2
88.4%
#19Qwen2.5-Coder 7B Instruct
88.4%
#20Qwen2.5 32B Instruct
88.4%

MBPP EvalPlus

Rank #2 of 2
#1Llama 3.1 405B Instruct
88.6%
#2Llama 3.3 70B Instruct
87.6%

MMLU

Rank #24 of 78
#21Llama 3.2 90B Instruct
86.0%
#22Grok-2 mini
86.2%
#23GPT-4
86.4%
#24Llama 3.3 70B Instruct
86.0%
#25Gemini 1.5 Pro
85.9%
#26Nova Pro
85.9%
#27GPT-4o
85.7%
All Benchmark Results for Llama 3.3 70B Instruct
Complete list of benchmark scores with detailed information
IFEval
IFEval benchmark
code
text
0.92
92.1%
Self-reported
MGSM
MGSM benchmark
math
text
0.91
91.1%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.88
88.4%
Self-reported
MBPP EvalPlus
MBPP EvalPlus benchmark
code
text
0.88
87.6%
Self-reported
MMLU
MMLU benchmark
general
text
0.86
86.0%
Self-reported
BFCL v2
BFCL v2 benchmark
general
text
0.77
77.3%
Self-reported
MATH
MATH benchmark
math
text
0.77
77.0%
Self-reported
MMLU-Pro
MMLU-Pro benchmark
general
text
0.69
68.9%
Self-reported
GPQA
GPQA benchmark
general
text
0.51
50.5%
Self-reported