Phi 4

Name: Phi 4
Price: 0.07 USD
Rating: 66.0 (13 reviews)
Author: Microsoft

Zero-eval

#3PhiBench

by Microsoft

About

Phi 4 is a language model developed by Microsoft. It achieves strong performance with an average score of 66.0% across 13 benchmarks. It excels particularly in MMLU (84.8%), HumanEval+ (82.8%), HumanEval (82.6%). The model shows particular specialization in math tasks with an average performance of 80.5%. The model is available through 1 API provider. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Microsoft's latest advancement in AI technology.

Pricing Range

Input (per 1M)$0.07 -$0.07

Output (per 1M)$0.14 -$0.14

Providers1

Timeline

AnnouncedDec 12, 2024

ReleasedDec 12, 2024

Knowledge CutoffJun 1, 2024

Specifications

Training Tokens9.8T

License & Family

License

MIT

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

13 benchmarks

Average Score

66.0%

Best Score

84.8%

High Performers (80%+)

Performance Metrics

Max Context Window

32.0K

Avg Throughput

33.0 tok/s

Avg Latency

0ms

Top Categories

math

80.5%

code

76.1%

general

60.2%

roleplay

47.6%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MMLU

Rank #30 of 78

#27o1-mini

85.2%

#28Llama 4 Maverick

85.5%

#29GPT-4o

85.7%

#30Phi 4

84.8%

#31Mistral Large 2

84.0%

#32Llama 3.1 70B Instruct

83.6%

#33Qwen2.5 32B Instruct

83.3%

HumanEval+

Rank #5 of 8

#2Granite 3.3 8B Base

86.1%

#3Granite 3.3 8B Instruct

86.1%

#4Phi 4 Reasoning Plus

92.3%

#5Phi 4

82.8%

#6IBM Granite 4.0 Tiny Preview

78.3%

#7Qwen2.5 32B Instruct

52.4%

#8Qwen2.5 14B Instruct

51.2%

HumanEval

Rank #37 of 62

#34Qwen2.5 14B Instruct

83.5%

#35Gemini 1.5 Pro

84.1%

#36Mistral Small 3 24B Instruct

84.8%

#37Phi 4

82.6%

#38IBM Granite 4.0 Tiny Preview

82.4%

#39Codestral-22B

81.1%

#40Nova Micro

81.1%

MGSM

Rank #19 of 31

#16Gemini 1.5 Flash

82.6%

#17Claude 3 Sonnet

83.5%

#18Qwen3 235B A22B

83.5%

#19Phi 4

80.6%

#20Claude 3 Haiku

75.1%

#21GPT-4

74.5%

#22Llama 3.2 11B Instruct

68.9%

MATH

Rank #13 of 63

#10Qwen2.5 VL 32B Instruct

82.2%

#11Qwen2.5 32B Instruct

83.1%

#12Qwen2.5 72B Instruct

83.1%

#13Phi 4

80.4%

#14Qwen2.5 14B Instruct

80.0%

#15Claude 3.5 Sonnet

78.3%

#16Gemini 1.5 Flash

77.9%

All Benchmark Results for Phi 4

Complete list of benchmark scores with detailed information


MMLU MMLU benchmark	general	text	0.85	84.8%	Self-reported
HumanEval+ HumanEval+ benchmark	code	text	0.83	82.8%	Self-reported
HumanEval HumanEval benchmark	code	text	0.83	82.6%	Self-reported
MGSM MGSM benchmark	math	text	0.81	80.6%	Self-reported
MATH MATH benchmark	math	text	0.80	80.4%	Self-reported
DROP DROP benchmark	general	text	0.76	75.5%	Self-reported
Arena Hard Arena Hard benchmark	general	text	0.75	75.4%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.70	70.4%	Self-reported
IFEval IFEval benchmark	code	text	0.63	63.0%	Self-reported
PhiBench PhiBench benchmark	general	text	0.56	56.2%	Self-reported

Showing 1 to 10 of 13 benchmarks

Resources

API Reference Research Paper Blog Post Model Weights