
Llama 3.1 Nemotron Ultra 253B v1
Zero-eval
#2BFCL v2
#3MATH-500
by NVIDIA
About
Llama 3.1 Nemotron Ultra 253B v1 is a language model developed by NVIDIA. It achieves strong performance with an average score of 79.2% across 6 benchmarks. It excels particularly in MATH-500 (97.0%), IFEval (89.5%), GPQA (76.0%). Released in 2025, it represents NVIDIA's latest advancement in AI technology.
Timeline
AnnouncedApr 7, 2025
ReleasedApr 7, 2025
Knowledge CutoffDec 1, 2023
Specifications
License & Family
License
Llama 3.1 Community License
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
6 benchmarks
Average Score
79.2%
Best Score
97.0%
High Performers (80%+)
2Top Categories
math
97.0%
code
77.9%
general
74.2%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MATH-500
Rank #3 of 22
#1DeepSeek-R1
97.3%
#2Kimi K2 Instruct
97.4%
#3Llama 3.1 Nemotron Ultra 253B v1
97.0%
#4Llama-3.3 Nemotron Super 49B v1
96.6%
#5Claude 3.7 Sonnet
96.2%
#6Kimi-k1.5
96.2%
IFEval
Rank #9 of 37
#6Nova Lite
89.7%
#7Kimi K2 Instruct
89.8%
#8Gemma 3 4B
90.2%
#9Llama 3.1 Nemotron Ultra 253B v1
89.5%
#10Gemma 3 12B
88.9%
#11Qwen3-235B-A22B-Instruct-2507
88.7%
#12Llama 3.1 405B Instruct
88.6%
GPQA
Rank #19 of 115
#16o3-mini
77.2%
#17Qwen3-235B-A22B-Instruct-2507
77.5%
#18o1
78.0%
#19Llama 3.1 Nemotron Ultra 253B v1
76.0%
#20Claude Sonnet 4
75.4%
#21Kimi K2 Instruct
75.1%
#22Gemini 2.0 Flash Thinking
74.2%
BFCL v2
Rank #2 of 5
#1Llama 3.3 70B Instruct
77.3%
#2Llama 3.1 Nemotron Ultra 253B v1
74.1%
#3Llama-3.3 Nemotron Super 49B v1
73.7%
#4Llama 3.2 3B Instruct
67.0%
#5Llama 3.1 Nemotron Nano 8B V1
63.6%
AIME 2025
Rank #18 of 36
#15Qwen3 32B
72.9%
#16Claude Opus 4
75.5%
#17Phi 4 Reasoning Plus
78.0%
#18Llama 3.1 Nemotron Ultra 253B v1
72.5%
#19Gemini 2.5 Flash
72.0%
#20Qwen3 30B A3B
70.9%
#21Claude Sonnet 4
70.5%
All Benchmark Results for Llama 3.1 Nemotron Ultra 253B v1
Complete list of benchmark scores with detailed information
MATH-500 MATH-500 benchmark | math | text | 0.97 | 97.0% | Self-reported |
IFEval IFEval benchmark | code | text | 0.89 | 89.5% | Self-reported |
GPQA GPQA benchmark | general | text | 0.76 | 76.0% | Self-reported |
BFCL v2 BFCL v2 benchmark | general | text | 0.74 | 74.1% | Self-reported |
AIME 2025 AIME 2025 benchmark | general | text | 0.72 | 72.5% | Self-reported |
LiveCodeBench LiveCodeBench benchmark | code | text | 0.66 | 66.3% | Self-reported |