Meta

Llama 3.2 3B Instruct

Zero-eval
#1NIH/Multi-needle
#1InfiniteBench/En.MC
#1Open-rewrite
+2 more

by Meta

About

Llama 3.2 3B Instruct is a language model developed by Meta. The model shows competitive results across 15 benchmarks. It excels particularly in NIH/Multi-needle (84.7%), ARC-C (78.6%), GSM8k (77.7%). It supports a 256K token context window for handling large documents. The model is available through 1 API provider. Released in 2024, it represents Meta's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.01 -$0.01
Output (per 1M)$0.02 -$0.02
Providers1
Timeline
AnnouncedSep 25, 2024
ReleasedSep 25, 2024
Specifications
Training Tokens9.0T
License & Family
License
Llama 3.2 Community License
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

15 benchmarks
Average Score
55.6%
Best Score
84.7%
High Performers (80%+)
1

Performance Metrics

Max Context Window
256.0K
Avg Throughput
171.5 tok/s
Avg Latency
0ms

Top Categories

long_context
84.7%
code
77.4%
reasoning
74.2%
math
61.3%
general
42.5%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

NIH/Multi-needle

Rank #1 of 1
#1Llama 3.2 3B Instruct
84.7%

ARC-C

Rank #16 of 31
#13Llama 3.1 8B Instruct
83.4%
#14Phi 4 Mini
83.7%
#15Phi-3.5-mini-instruct
84.6%
#16Llama 3.2 3B Instruct
78.6%
#17Ministral 8B Instruct
71.9%
#18Gemma 2 27B
71.4%
#19Command R+
71.0%

GSM8k

Rank #39 of 46
#36Mistral Small 3 24B Base
80.7%
#37Granite 3.3 8B Instruct
80.9%
#38Qwen2 7B Instruct
82.3%
#39Llama 3.2 3B Instruct
77.7%
#40Jamba 1.5 Mini
75.8%
#41Gemma 2 27B
74.0%
#42Command R+
70.7%

IFEval

Rank #30 of 37
#27Llama 3.1 Nemotron Nano 8B V1
79.3%
#28Gemma 3 1B
80.2%
#29Llama 3.1 8B Instruct
80.4%
#30Llama 3.2 3B Instruct
77.4%
#31Granite 3.3 8B Instruct
74.8%
#32Granite 3.3 8B Base
74.8%
#33GPT-4.1 nano
74.5%

HellaSwag

Rank #22 of 24
#19Gemma 3n E2B
72.2%
#20Gemma 3n E2B Instructed LiteRT (Preview)
72.2%
#21Qwen2.5-Coder 7B Instruct
76.8%
#22Llama 3.2 3B Instruct
69.8%
#23Phi-3.5-mini-instruct
69.4%
#24Phi 4 Mini
69.1%
All Benchmark Results for Llama 3.2 3B Instruct
Complete list of benchmark scores with detailed information
NIH/Multi-needle
NIH/Multi-needle benchmark
long_context
text
0.85
84.7%
Self-reported
ARC-C
ARC-C benchmark
reasoning
text
0.79
78.6%
Self-reported
GSM8k
GSM8k benchmark
math
text
0.78
77.7%
Self-reported
IFEval
IFEval benchmark
code
text
0.77
77.4%
Self-reported
HellaSwag
HellaSwag benchmark
reasoning
text
0.70
69.8%
Self-reported
BFCL v2
BFCL v2 benchmark
general
text
0.67
67.0%
Self-reported
MMLU
MMLU benchmark
general
text
0.63
63.4%
Self-reported
InfiniteBench/En.MC
InfiniteBench/En.MC benchmark
general
text
0.63
63.3%
Self-reported
MGSM
MGSM benchmark
math
text
0.58
58.2%
Self-reported
MATH
MATH benchmark
math
text
0.48
48.0%
Self-reported
Showing 1 to 10 of 15 benchmarks