Phi 4 Mini Reasoning

Name: Phi 4 Mini Reasoning
Rating: 68.0 (3 reviews)
Author: Microsoft

Zero-eval

#1AIME

by Microsoft

About

Phi 4 Mini Reasoning is a language model developed by Microsoft. It achieves strong performance with an average score of 68.0% across 3 benchmarks. It excels particularly in MATH-500 (94.6%), AIME (57.5%), GPQA (52.0%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Microsoft's latest advancement in AI technology.

Timeline

AnnouncedApr 30, 2025

ReleasedApr 30, 2025

Knowledge CutoffFeb 1, 2025

Specifications

Training Tokens150.0B

License & Family

License

MIT

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

3 benchmarks

Average Score

68.0%

Best Score

94.6%

High Performers (80%+)

Top Categories

math

94.6%

general

54.8%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MATH-500

Rank #9 of 22

#6Llama 3.1 Nemotron Nano 8B V1

95.4%

#7DeepSeek R1 Zero

95.9%

#8Kimi-k1.5

96.2%

#9Phi 4 Mini Reasoning

94.6%

#10DeepSeek R1 Distill Llama 70B

94.5%

#11DeepSeek R1 Distill Qwen 32B

94.3%

#12DeepSeek-V3 0324

94.0%

AIME

Rank #1 of 1

#1Phi 4 Mini Reasoning

57.5%

GPQA

Rank #57 of 115

#54GPT-4o

53.6%

#55Llama 3.1 Nemotron Nano 8B V1

54.1%

#56Grok-2

56.0%

#57Phi 4 Mini Reasoning

52.0%

#58Gemini 2.0 Flash-Lite

51.5%

#59Grok-2 mini

51.0%

#60Gemini 1.5 Flash

51.0%

All Benchmark Results for Phi 4 Mini Reasoning

Complete list of benchmark scores with detailed information


MATH-500 MATH-500 benchmark	math	text	0.95	94.6%	Self-reported
AIME AIME benchmark	general	text	0.57	57.5%	Self-reported
GPQA GPQA benchmark	general	text	0.52	52.0%	Self-reported

Resources

API Reference Research Paper Blog Post Model Weights