Phi-3.5-MoE-instruct

Name: Phi-3.5-MoE-instruct
Rating: 65.6 (31 reviews)
Author: Microsoft

Zero-eval

#1OpenBookQA

#1PIQA

#1RULER

+14 more

by Microsoft

About

Phi-3.5-MoE-instruct is a language model developed by Microsoft. It achieves strong performance with an average score of 65.6% across 31 benchmarks. It excels particularly in ARC-C (91.0%), OpenBookQA (89.6%), GSM8k (88.7%). The model shows particular specialization in reasoning tasks with an average performance of 85.4%. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Microsoft's latest advancement in AI technology.

Timeline

AnnouncedAug 23, 2024

ReleasedAug 23, 2024

Specifications

Training Tokens4.9T

License & Family

License

MIT

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

31 benchmarks

Average Score

65.6%

Best Score

91.0%

High Performers (80%+)

Top Categories

reasoning

85.4%

factuality

77.5%

code

75.8%

math

69.0%

general

60.9%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

ARC-C

Rank #9 of 31

#6Mistral Small 3 24B Base

91.3%

#7Nova Lite

92.4%

#8Jamba 1.5 Large

93.0%

#9Phi-3.5-MoE-instruct

91.0%

#10Nova Micro

90.2%

#11Claude 3 Haiku

89.2%

#12Jamba 1.5 Mini

85.7%

OpenBookQA

Rank #1 of 4

#1Phi-3.5-MoE-instruct

89.6%

#2Phi 4 Mini

79.2%

#3Phi-3.5-mini-instruct

79.2%

#4Mistral NeMo Instruct

60.6%

GSM8k

Rank #30 of 46

#27Qwen2.5-Omni-7B

88.7%

#28Claude 3 Haiku

88.9%

#29Gemma 3 4B

89.2%

#30Phi-3.5-MoE-instruct

88.7%

#31Phi 4 Mini

88.6%

#32Jamba 1.5 Large

87.0%

#33Phi-3.5-mini-instruct

86.2%

PIQA

Rank #1 of 9

#1Phi-3.5-MoE-instruct

88.6%

#2Gemma 2 27B

83.2%

#3Gemma 2 9B

81.7%

#4Phi-3.5-mini-instruct

81.0%

RULER

Rank #1 of 2

#1Phi-3.5-MoE-instruct

87.1%

#2Phi-3.5-mini-instruct

84.1%

All Benchmark Results for Phi-3.5-MoE-instruct

Complete list of benchmark scores with detailed information


ARC-C ARC-C benchmark	reasoning	text	0.91	91.0%	Self-reported
OpenBookQA OpenBookQA benchmark	general	text	0.90	89.6%	Self-reported
GSM8k GSM8k benchmark	math	text	0.89	88.7%	Self-reported
PIQA PIQA benchmark	general	text	0.89	88.6%	Self-reported
RULER RULER benchmark	general	text	0.87	87.1%	Self-reported
RepoQA RepoQA benchmark	general	text	0.85	85.0%	Self-reported
BoolQ BoolQ benchmark	general	text	0.85	84.6%	Self-reported
HellaSwag HellaSwag benchmark	reasoning	text	0.84	83.8%	Self-reported
MEGA XStoryCloze MEGA XStoryCloze benchmark	general	text	0.83	82.8%	Self-reported
Winogrande Winogrande benchmark	reasoning	text	0.81	81.3%	Self-reported

Showing 1 to 10 of 31 benchmarks

Resources

API Reference Research Paper Blog Post Model Weights