Phi 4 Mini

Name: Phi 4 Mini
Rating: 65.4 (17 reviews)
Author: Microsoft

Zero-eval

#2OpenBookQA

#2Multilingual MMLU

#3Social IQa

+1 more

by Microsoft

About

Phi 4 Mini is a language model developed by Microsoft. It achieves strong performance with an average score of 65.4% across 17 benchmarks. It excels particularly in GSM8k (88.6%), ARC-C (83.7%), BoolQ (81.2%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Microsoft's latest advancement in AI technology.

Timeline

AnnouncedFeb 1, 2025

ReleasedFeb 1, 2025

Knowledge CutoffJun 1, 2024

Specifications

Training Tokens5.0T

License & Family

License

MIT

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

17 benchmarks

Average Score

65.4%

Best Score

88.6%

High Performers (80%+)

Top Categories

reasoning

73.3%

math

72.2%

factuality

66.4%

general

60.8%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

GSM8k

Rank #31 of 46

#28Phi-3.5-MoE-instruct

88.7%

#29Qwen2.5-Omni-7B

88.7%

#30Claude 3 Haiku

88.9%

#31Phi 4 Mini

88.6%

#32Jamba 1.5 Large

87.0%

#33Phi-3.5-mini-instruct

86.2%

#34Gemini 1.5 Flash

86.2%

ARC-C

Rank #14 of 31

#11Phi-3.5-mini-instruct

84.6%

#12Jamba 1.5 Mini

85.7%

#13Claude 3 Haiku

89.2%

#14Phi 4 Mini

83.7%

#15Llama 3.1 8B Instruct

83.4%

#16Llama 3.2 3B Instruct

78.6%

#17Ministral 8B Instruct

71.9%

BoolQ

Rank #6 of 9

#3Gemma 3n E4B Instructed LiteRT Preview

81.6%

#4Gemma 3n E4B

81.6%

#5Gemma 2 9B

84.2%

#6Phi 4 Mini

81.2%

#7Phi-3.5-mini-instruct

78.0%

#8Gemma 3n E2B

76.4%

#9Gemma 3n E2B Instructed LiteRT (Preview)

76.4%

OpenBookQA

Rank #2 of 4

#1Phi-3.5-MoE-instruct

89.6%

#2Phi 4 Mini

79.2%

#3Phi-3.5-mini-instruct

79.2%

#4Mistral NeMo Instruct

60.6%

PIQA

Rank #9 of 9

#6Gemma 3n E2B

78.9%

#7Gemma 3n E2B Instructed LiteRT (Preview)

78.9%

#8Gemma 3n E4B

81.0%

#9Phi 4 Mini

77.6%

All Benchmark Results for Phi 4 Mini

Complete list of benchmark scores with detailed information


GSM8k GSM8k benchmark	math	text	0.89	88.6%	Self-reported
ARC-C ARC-C benchmark	reasoning	text	0.84	83.7%	Self-reported
BoolQ BoolQ benchmark	general	text	0.81	81.2%	Self-reported
OpenBookQA OpenBookQA benchmark	general	text	0.79	79.2%	Self-reported
PIQA PIQA benchmark	general	text	0.78	77.6%	Self-reported
Social IQa Social IQa benchmark	general	text	0.72	72.5%	Self-reported
BIG-Bench Hard BIG-Bench Hard benchmark	general	text	0.70	70.4%	Self-reported
HellaSwag HellaSwag benchmark	reasoning	text	0.69	69.1%	Self-reported
MMLU MMLU benchmark	general	text	0.67	67.3%	Self-reported
Winogrande Winogrande benchmark	reasoning	text	0.67	67.0%	Self-reported

Showing 1 to 10 of 17 benchmarks

Resources

Research Paper Blog Post Model Weights