
Phi 4 Mini Reasoning
Zero-eval
#1AIME
by Microsoft
About
Phi 4 Mini Reasoning is a language model developed by Microsoft. It achieves strong performance with an average score of 68.0% across 3 benchmarks. It excels particularly in MATH-500 (94.6%), AIME (57.5%), GPQA (52.0%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Microsoft's latest advancement in AI technology.
Timeline
AnnouncedApr 30, 2025
ReleasedApr 30, 2025
Knowledge CutoffFeb 1, 2025
Specifications
Training Tokens150.0B
License & Family
License
MIT
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
3 benchmarks
Average Score
68.0%
Best Score
94.6%
High Performers (80%+)
1Top Categories
math
94.6%
general
54.8%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MATH-500
Rank #9 of 22
#6Llama 3.1 Nemotron Nano 8B V1
95.4%
#7DeepSeek R1 Zero
95.9%
#8Kimi-k1.5
96.2%
#9Phi 4 Mini Reasoning
94.6%
#10DeepSeek R1 Distill Llama 70B
94.5%
#11DeepSeek R1 Distill Qwen 32B
94.3%
#12DeepSeek-V3 0324
94.0%
AIME
Rank #1 of 1
#1Phi 4 Mini Reasoning
57.5%
GPQA
Rank #57 of 115
#54GPT-4o
53.6%
#55Llama 3.1 Nemotron Nano 8B V1
54.1%
#56Grok-2
56.0%
#57Phi 4 Mini Reasoning
52.0%
#58Gemini 2.0 Flash-Lite
51.5%
#59Grok-2 mini
51.0%
#60Gemini 1.5 Flash
51.0%
All Benchmark Results for Phi 4 Mini Reasoning
Complete list of benchmark scores with detailed information
MATH-500 MATH-500 benchmark | math | text | 0.95 | 94.6% | Self-reported |
AIME AIME benchmark | general | text | 0.57 | 57.5% | Self-reported |
GPQA GPQA benchmark | general | text | 0.52 | 52.0% | Self-reported |