Llama 3.1 8B Instruct

Name: Llama 3.1 8B Instruct
Price: 0.1 USD
Rating: 61.3 (18 reviews)
Author: Meta

Zero-eval

#1MBPP EvalPlus (base)

#2GSM-8K (CoT)

#2MATH (CoT)

+8 more

by Meta

About

Llama 3.1 8B Instruct is a language model developed by Meta. It achieves strong performance with an average score of 61.3% across 18 benchmarks. It excels particularly in GSM-8K (CoT) (84.5%), ARC-C (83.4%), API-Bank (82.6%). It supports a 262K token context window for handling large documents. The model is available through 9 API providers. Released in 2024, it represents Meta's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.03 -$0.22

Output (per 1M)$0.03 -$0.22

Providers9

Timeline

AnnouncedJul 23, 2024

ReleasedJul 23, 2024

Knowledge CutoffDec 31, 2023

Specifications

Training Tokens15.0T

License & Family

License

Llama 3.1 Community License

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

18 benchmarks

Average Score

61.3%

Best Score

84.5%

High Performers (80%+)

Performance Metrics

Max Context Window

262.1K

Avg Throughput

532.6 tok/s

Avg Latency

0ms

Top Categories

reasoning

83.4%

math

68.4%

code

65.8%

general

54.0%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM-8K (CoT)

Rank #2 of 2

#1Llama 3.1 70B Instruct

95.1%

#2Llama 3.1 8B Instruct

84.5%

ARC-C

Rank #15 of 31

#12Phi 4 Mini

83.7%

#13Phi-3.5-mini-instruct

84.6%

#14Jamba 1.5 Mini

85.7%

#15Llama 3.1 8B Instruct

83.4%

#16Llama 3.2 3B Instruct

78.6%

#17Ministral 8B Instruct

71.9%

#18Gemma 2 27B

71.4%

API-Bank

Rank #3 of 3

#1Llama 3.1 70B Instruct

90.0%

#2Llama 3.1 405B Instruct

92.0%

#3Llama 3.1 8B Instruct

82.6%

IFEval

Rank #27 of 37

#24GPT-4o

81.0%

#25Mistral Small 3 24B Instruct

82.9%

#26DeepSeek-R1

83.3%

#27Llama 3.1 8B Instruct

80.4%

#28Gemma 3 1B

80.2%

#29Llama 3.1 Nemotron Nano 8B V1

79.3%

#30Llama 3.2 3B Instruct

77.4%

BFCL

Rank #3 of 10

#1Llama 3.1 70B Instruct

84.8%

#2Llama 3.1 405B Instruct

88.5%

#3Llama 3.1 8B Instruct

76.1%

#4Qwen3 235B A22B

70.8%

#5Qwen3 32B

70.3%

#6Qwen3 30B A3B

69.1%

All Benchmark Results for Llama 3.1 8B Instruct

Complete list of benchmark scores with detailed information


GSM-8K (CoT) GSM-8K (CoT) benchmark	math	text	0.84	84.5%	Self-reported
ARC-C ARC-C benchmark	reasoning	text	0.83	83.4%	Self-reported
API-Bank API-Bank benchmark	general	text	0.83	82.6%	Self-reported
IFEval IFEval benchmark	code	text	0.80	80.4%	Self-reported
BFCL BFCL benchmark	general	text	0.76	76.1%	Self-reported
MMLU (CoT) MMLU (CoT) benchmark	general	text	0.73	73.0%	Self-reported
MBPP EvalPlus (base) MBPP EvalPlus (base) benchmark	code	text	0.73	72.8%	Self-reported
HumanEval HumanEval benchmark	code	text	0.73	72.6%	Self-reported
MMLU MMLU benchmark	general	text	0.69	69.4%	Self-reported
Multilingual MGSM (CoT) Multilingual MGSM (CoT) benchmark	math	text	0.69	68.9%	Self-reported

Showing 1 to 10 of 18 benchmarks

Resources

API Reference Blog Post Repository Model Weights