Llama 3.1 Nemotron 70B Instruct

Name: Llama 3.1 Nemotron 70B Instruct
Rating: 67.9 (11 reviews)
Author: NVIDIA

Zero-eval

#1GSM8K Chat

#1MMLU Chat

#1Instruct HumanEval

+1 more

by NVIDIA

About

Llama 3.1 Nemotron 70B Instruct is a language model developed by NVIDIA. It achieves strong performance with an average score of 67.9% across 11 benchmarks. It excels particularly in GSM8k (91.4%), HellaSwag (85.6%), Winogrande (84.5%). The model shows particular specialization in math tasks with an average performance of 86.7%. Released in 2024, it represents NVIDIA's latest advancement in AI technology.

Timeline

AnnouncedOct 1, 2024

ReleasedOct 1, 2024

Knowledge CutoffDec 1, 2023

Specifications

License & Family

License

Llama 3.1 Community License

Base ModelLlama 3.1 70B Instruct

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

11 benchmarks

Average Score

67.9%

Best Score

91.4%

High Performers (80%+)

Top Categories

math

86.7%

reasoning

79.8%

code

73.8%

general

64.1%

factuality

58.6%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #22 of 46

#19Qwen2.5 7B Instruct

91.6%

#20Kimi K2 Base

92.1%

#21Nova Micro

92.3%

#22Llama 3.1 Nemotron 70B Instruct

91.4%

#23Qwen2 72B Instruct

91.1%

#24Qwen2.5-Coder 32B Instruct

91.1%

#25Gemini 1.5 Pro

90.8%

HellaSwag

Rank #10 of 24

#7Claude 3 Haiku

85.9%

#8Gemma 2 27B

86.4%

#9Gemini 1.5 Flash

86.5%

#10Llama 3.1 Nemotron 70B Instruct

85.6%

#11Qwen2.5 32B Instruct

85.2%

#12Phi-3.5-MoE-instruct

83.8%

#13Mistral NeMo Instruct

83.5%

Winogrande

Rank #4 of 19

#1Qwen2 72B Instruct

85.1%

#2Command R+

85.4%

#3GPT-4

87.5%

#4Llama 3.1 Nemotron 70B Instruct

84.5%

#5Gemma 2 27B

83.7%

#6Qwen2.5 32B Instruct

82.0%

#7Phi-3.5-MoE-instruct

81.3%

GSM8K Chat

Rank #1 of 1

#1Llama 3.1 Nemotron 70B Instruct

81.9%

MMLU Chat

Rank #1 of 1

#1Llama 3.1 Nemotron 70B Instruct

80.6%

All Benchmark Results for Llama 3.1 Nemotron 70B Instruct

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.91	91.4%	Self-reported
HellaSwag HellaSwag benchmark	reasoning	text	0.86	85.6%	Self-reported
Winogrande Winogrande benchmark	reasoning	text	0.85	84.5%	Self-reported
GSM8K Chat GSM8K Chat benchmark	math	text	0.82	81.9%	Self-reported
MMLU Chat MMLU Chat benchmark	general	text	0.81	80.6%	Self-reported
MMLU MMLU benchmark	general	text	0.80	80.2%	Self-reported
Instruct HumanEval Instruct HumanEval benchmark	code	text	0.74	73.8%	Self-reported
ARC-C ARC-C benchmark	reasoning	text	0.69	69.2%	Self-reported
TruthfulQA TruthfulQA benchmark	factuality	text	0.59	58.6%	Self-reported
XLSum English XLSum English benchmark	general	text	0.32	31.6%	Self-reported

Showing 1 to 10 of 11 benchmarks

Resources

API Reference Research Paper Blog Post Model Weights