Llama-3.3 Nemotron Super 49B v1

Name: Llama-3.3 Nemotron Super 49B v1
Rating: 81.0 (7 reviews)
Author: NVIDIA

Zero-eval

#1MBPP

#2MT-Bench

#3BFCL v2

by NVIDIA

About

Llama-3.3 Nemotron Super 49B v1 is a language model developed by NVIDIA. This model demonstrates exceptional performance with an average score of 81.0% across 7 benchmarks. It excels particularly in MATH-500 (96.6%), MT-Bench (91.7%), MBPP (91.3%). Released in 2025, it represents NVIDIA's latest advancement in AI technology.

Timeline

AnnouncedMar 18, 2025

ReleasedMar 18, 2025

Knowledge CutoffDec 31, 2023

Specifications

License & Family

License

Llama 3.1 Community License

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

7 benchmarks

Average Score

81.0%

Best Score

96.6%

High Performers (80%+)

Top Categories

math

96.6%

roleplay

91.7%

code

91.3%

general

71.8%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MATH-500

Rank #4 of 22

#1Llama 3.1 Nemotron Ultra 253B v1

97.0%

#2DeepSeek-R1

97.3%

#3Kimi K2 Instruct

97.4%

#4Llama-3.3 Nemotron Super 49B v1

96.6%

#5Claude 3.7 Sonnet

96.2%

#6Kimi-k1.5

96.2%

#7DeepSeek R1 Zero

95.9%

MT-Bench

Rank #2 of 11

#1Qwen2.5 72B Instruct

93.5%

#2Llama-3.3 Nemotron Super 49B v1

91.7%

#3DeepSeek-V2.5

90.2%

#4Qwen2.5 7B Instruct

87.5%

#5Mistral Large 2

86.3%

MBPP

Rank #1 of 31

#1Llama-3.3 Nemotron Super 49B v1

91.3%

#2Qwen2.5-Coder 32B Instruct

90.2%

#3Qwen2.5 72B Instruct

88.2%

#4Llama 3.1 Nemotron Nano 8B V1

84.6%

Arena Hard

Rank #5 of 22

#2Qwen3 30B A3B

91.0%

#3DeepSeek-R1

92.3%

#4Qwen3 32B

93.8%

#5Llama-3.3 Nemotron Super 49B v1

88.3%

#6Mistral Small 3 24B Instruct

87.6%

#7Qwen2.5 72B Instruct

81.2%

#8Phi 4 Reasoning Plus

79.0%

BFCL v2

Rank #3 of 5

#1Llama 3.1 Nemotron Ultra 253B v1

74.1%

#2Llama 3.3 70B Instruct

77.3%

#3Llama-3.3 Nemotron Super 49B v1

73.7%

#4Llama 3.2 3B Instruct

67.0%

#5Llama 3.1 Nemotron Nano 8B V1

63.6%

All Benchmark Results for Llama-3.3 Nemotron Super 49B v1

Complete list of benchmark scores with detailed information


MATH-500 MATH-500 benchmark	math	text	0.97	96.6%	Self-reported
MT-Bench MT-Bench benchmark	roleplay	text	91.70	91.7%	Self-reported
MBPP MBPP benchmark	code	text	91.30	91.3%	Self-reported
Arena Hard Arena Hard benchmark	general	text	0.88	88.3%	Self-reported
BFCL v2 BFCL v2 benchmark	general	text	0.74	73.7%	Self-reported
GPQA GPQA benchmark	general	text	0.67	66.7%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.58	58.4%	Self-reported

Resources

Playground Research Paper Blog Post Model Weights