NVIDIA

Llama 3.1 Nemotron Ultra 253B v1

Zero-eval
#2BFCL v2
#3MATH-500

by NVIDIA

About

Llama 3.1 Nemotron Ultra 253B v1 is a language model developed by NVIDIA. It achieves strong performance with an average score of 79.2% across 6 benchmarks. It excels particularly in MATH-500 (97.0%), IFEval (89.5%), GPQA (76.0%). Released in 2025, it represents NVIDIA's latest advancement in AI technology.

Timeline
AnnouncedApr 7, 2025
ReleasedApr 7, 2025
Knowledge CutoffDec 1, 2023
Specifications
License & Family
License
Llama 3.1 Community License
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

6 benchmarks
Average Score
79.2%
Best Score
97.0%
High Performers (80%+)
2

Top Categories

math
97.0%
code
77.9%
general
74.2%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

MATH-500

Rank #3 of 22
#1DeepSeek-R1
97.3%
#2Kimi K2 Instruct
97.4%
#3Llama 3.1 Nemotron Ultra 253B v1
97.0%
#4Llama-3.3 Nemotron Super 49B v1
96.6%
#5Claude 3.7 Sonnet
96.2%
#6Kimi-k1.5
96.2%

IFEval

Rank #9 of 37
#6Nova Lite
89.7%
#7Kimi K2 Instruct
89.8%
#8Gemma 3 4B
90.2%
#9Llama 3.1 Nemotron Ultra 253B v1
89.5%
#10Gemma 3 12B
88.9%
#11Qwen3-235B-A22B-Instruct-2507
88.7%
#12Llama 3.1 405B Instruct
88.6%

GPQA

Rank #19 of 115
#16o3-mini
77.2%
#17Qwen3-235B-A22B-Instruct-2507
77.5%
#18o1
78.0%
#19Llama 3.1 Nemotron Ultra 253B v1
76.0%
#20Claude Sonnet 4
75.4%
#21Kimi K2 Instruct
75.1%
#22Gemini 2.0 Flash Thinking
74.2%

BFCL v2

Rank #2 of 5
#1Llama 3.3 70B Instruct
77.3%
#2Llama 3.1 Nemotron Ultra 253B v1
74.1%
#3Llama-3.3 Nemotron Super 49B v1
73.7%
#4Llama 3.2 3B Instruct
67.0%
#5Llama 3.1 Nemotron Nano 8B V1
63.6%

AIME 2025

Rank #18 of 36
#15Qwen3 32B
72.9%
#16Claude Opus 4
75.5%
#17Phi 4 Reasoning Plus
78.0%
#18Llama 3.1 Nemotron Ultra 253B v1
72.5%
#19Gemini 2.5 Flash
72.0%
#20Qwen3 30B A3B
70.9%
#21Claude Sonnet 4
70.5%
All Benchmark Results for Llama 3.1 Nemotron Ultra 253B v1
Complete list of benchmark scores with detailed information
MATH-500
MATH-500 benchmark
math
text
0.97
97.0%
Self-reported
IFEval
IFEval benchmark
code
text
0.89
89.5%
Self-reported
GPQA
GPQA benchmark
general
text
0.76
76.0%
Self-reported
BFCL v2
BFCL v2 benchmark
general
text
0.74
74.1%
Self-reported
AIME 2025
AIME 2025 benchmark
general
text
0.72
72.5%
Self-reported
LiveCodeBench
LiveCodeBench benchmark
code
text
0.66
66.3%
Self-reported