Llama 3.1 Nemotron Ultra 253B v1

Name: Llama 3.1 Nemotron Ultra 253B v1
Rating: 79.2 (6 reviews)
Author: NVIDIA

Zero-eval

#2BFCL v2

#3MATH-500

by NVIDIA

About

Llama 3.1 Nemotron Ultra 253B v1 is a language model developed by NVIDIA. It achieves strong performance with an average score of 79.2% across 6 benchmarks. It excels particularly in MATH-500 (97.0%), IFEval (89.5%), GPQA (76.0%). Released in 2025, it represents NVIDIA's latest advancement in AI technology.

Timeline

AnnouncedApr 7, 2025

ReleasedApr 7, 2025

Knowledge CutoffDec 1, 2023

Specifications

License & Family

License

Llama 3.1 Community License

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

6 benchmarks

Average Score

79.2%

Best Score

97.0%

High Performers (80%+)

Top Categories

math

97.0%

code

77.9%

general

74.2%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MATH-500

Rank #3 of 22

#1DeepSeek-R1

97.3%

#2Kimi K2 Instruct

97.4%

#3Llama 3.1 Nemotron Ultra 253B v1

97.0%

#4Llama-3.3 Nemotron Super 49B v1

96.6%

#5Claude 3.7 Sonnet

96.2%

#6Kimi-k1.5

96.2%

IFEval

Rank #9 of 37

#6Nova Lite

89.7%

#7Kimi K2 Instruct

89.8%

#8Gemma 3 4B

90.2%

#9Llama 3.1 Nemotron Ultra 253B v1

89.5%

#10Gemma 3 12B

88.9%

#11Qwen3-235B-A22B-Instruct-2507

88.7%

#12Llama 3.1 405B Instruct

88.6%

GPQA

Rank #19 of 115

#16o3-mini

77.2%

#17Qwen3-235B-A22B-Instruct-2507

77.5%

#18o1

78.0%

#19Llama 3.1 Nemotron Ultra 253B v1

76.0%

#20Claude Sonnet 4

75.4%

#21Kimi K2 Instruct

75.1%

#22Gemini 2.0 Flash Thinking

74.2%

BFCL v2

Rank #2 of 5

#1Llama 3.3 70B Instruct

77.3%

#2Llama 3.1 Nemotron Ultra 253B v1

74.1%

#3Llama-3.3 Nemotron Super 49B v1

73.7%

#4Llama 3.2 3B Instruct

67.0%

#5Llama 3.1 Nemotron Nano 8B V1

63.6%

AIME 2025

Rank #18 of 36

#15Qwen3 32B

72.9%

#16Claude Opus 4

75.5%

#17Phi 4 Reasoning Plus

78.0%

#18Llama 3.1 Nemotron Ultra 253B v1

72.5%

#19Gemini 2.5 Flash

72.0%

#20Qwen3 30B A3B

70.9%

#21Claude Sonnet 4

70.5%

All Benchmark Results for Llama 3.1 Nemotron Ultra 253B v1

Complete list of benchmark scores with detailed information


MATH-500 MATH-500 benchmark	math	text	0.97	97.0%	Self-reported
IFEval IFEval benchmark	code	text	0.89	89.5%	Self-reported
GPQA GPQA benchmark	general	text	0.76	76.0%	Self-reported
BFCL v2 BFCL v2 benchmark	general	text	0.74	74.1%	Self-reported
AIME 2025 AIME 2025 benchmark	general	text	0.72	72.5%	Self-reported
LiveCodeBench LiveCodeBench benchmark	code	text	0.66	66.3%	Self-reported

Resources

Playground Research Paper Blog Post Model Weights