Microsoft

Phi 4 Mini Reasoning

Zero-eval
#1AIME

by Microsoft

About

Phi 4 Mini Reasoning is a language model developed by Microsoft. It achieves strong performance with an average score of 68.0% across 3 benchmarks. It excels particularly in MATH-500 (94.6%), AIME (57.5%), GPQA (52.0%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Microsoft's latest advancement in AI technology.

Timeline
AnnouncedApr 30, 2025
ReleasedApr 30, 2025
Knowledge CutoffFeb 1, 2025
Specifications
Training Tokens150.0B
License & Family
License
MIT
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

3 benchmarks
Average Score
68.0%
Best Score
94.6%
High Performers (80%+)
1

Top Categories

math
94.6%
general
54.8%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

MATH-500

Rank #9 of 22
#6Llama 3.1 Nemotron Nano 8B V1
95.4%
#7DeepSeek R1 Zero
95.9%
#8Kimi-k1.5
96.2%
#9Phi 4 Mini Reasoning
94.6%
#10DeepSeek R1 Distill Llama 70B
94.5%
#11DeepSeek R1 Distill Qwen 32B
94.3%
#12DeepSeek-V3 0324
94.0%

AIME

Rank #1 of 1
#1Phi 4 Mini Reasoning
57.5%

GPQA

Rank #57 of 115
#54GPT-4o
53.6%
#55Llama 3.1 Nemotron Nano 8B V1
54.1%
#56Grok-2
56.0%
#57Phi 4 Mini Reasoning
52.0%
#58Gemini 2.0 Flash-Lite
51.5%
#59Grok-2 mini
51.0%
#60Gemini 1.5 Flash
51.0%
All Benchmark Results for Phi 4 Mini Reasoning
Complete list of benchmark scores with detailed information
MATH-500
MATH-500 benchmark
math
text
0.95
94.6%
Self-reported
AIME
AIME benchmark
general
text
0.57
57.5%
Self-reported
GPQA
GPQA benchmark
general
text
0.52
52.0%
Self-reported