
Llama 3.3 70B Instruct
Zero-eval
#1BFCL v2
#2MBPP EvalPlus
by Meta
About
Llama 3.3 70B Instruct is a language model developed by Meta. It achieves strong performance with an average score of 79.9% across 9 benchmarks. It excels particularly in IFEval (92.1%), MGSM (91.1%), HumanEval (88.4%). The model shows particular specialization in code tasks with an average performance of 89.4%. It supports a 256K token context window for handling large documents. The model is available through 9 API providers. Released in 2024, it represents Meta's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.20 -$0.89
Output (per 1M)$0.20 -$7.90
Providers9
Timeline
AnnouncedDec 6, 2024
ReleasedDec 6, 2024
Specifications
Training Tokens15.0T
License & Family
License
Llama 3.3 Community License Agreement
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
9 benchmarks
Average Score
79.9%
Best Score
92.1%
High Performers (80%+)
5Performance Metrics
Max Context Window
256.0KAvg Throughput
451.9 tok/sAvg Latency
1msTop Categories
code
89.4%
math
84.0%
general
70.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
IFEval
Rank #4 of 37
#1Nova Pro
92.1%
#2Claude 3.7 Sonnet
93.2%
#3o3-mini
93.9%
#4Llama 3.3 70B Instruct
92.1%
#5Gemma 3 27B
90.4%
#6Gemma 3 4B
90.2%
#7Kimi K2 Instruct
89.8%
MGSM
Rank #5 of 31
#2Claude 3.5 Sonnet
91.6%
#3Claude 3.5 Sonnet
91.6%
#4o3-mini
92.0%
#5Llama 3.3 70B Instruct
91.1%
#6o1-preview
90.8%
#7Claude 3 Opus
90.7%
#8Llama 4 Scout
90.6%
HumanEval
Rank #17 of 62
#14Mistral Small 3.1 24B Instruct
88.4%
#15DeepSeek-V2.5
89.0%
#16Nova Pro
89.0%
#17Llama 3.3 70B Instruct
88.4%
#18Grok-2
88.4%
#19Qwen2.5-Coder 7B Instruct
88.4%
#20Qwen2.5 32B Instruct
88.4%
MBPP EvalPlus
Rank #2 of 2
#1Llama 3.1 405B Instruct
88.6%
#2Llama 3.3 70B Instruct
87.6%
MMLU
Rank #24 of 78
#21Llama 3.2 90B Instruct
86.0%
#22Grok-2 mini
86.2%
#23GPT-4
86.4%
#24Llama 3.3 70B Instruct
86.0%
#25Gemini 1.5 Pro
85.9%
#26Nova Pro
85.9%
#27GPT-4o
85.7%
All Benchmark Results for Llama 3.3 70B Instruct
Complete list of benchmark scores with detailed information
IFEval IFEval benchmark | code | text | 0.92 | 92.1% | Self-reported |
MGSM MGSM benchmark | math | text | 0.91 | 91.1% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.88 | 88.4% | Self-reported |
MBPP EvalPlus MBPP EvalPlus benchmark | code | text | 0.88 | 87.6% | Self-reported |
MMLU MMLU benchmark | general | text | 0.86 | 86.0% | Self-reported |
BFCL v2 BFCL v2 benchmark | general | text | 0.77 | 77.3% | Self-reported |
MATH MATH benchmark | math | text | 0.77 | 77.0% | Self-reported |
MMLU-Pro MMLU-Pro benchmark | general | text | 0.69 | 68.9% | Self-reported |
GPQA GPQA benchmark | general | text | 0.51 | 50.5% | Self-reported |