
Phi-3.5-mini-instruct
Zero-eval
#1Qasper
#1SQuALITY
#1QMSum
+11 more
by Microsoft
About
Phi-3.5-mini-instruct is a language model developed by Microsoft. The model shows competitive results across 31 benchmarks. It excels particularly in GSM8k (86.2%), ARC-C (84.6%), RULER (84.1%). It supports a 256K token context window for handling large documents. The model is available through 1 API provider. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Microsoft's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.10 -$0.10
Output (per 1M)$0.10 -$0.10
Providers1
Timeline
AnnouncedAug 23, 2024
ReleasedAug 23, 2024
Specifications
Training Tokens3.4T
License & Family
License
MIT
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
31 benchmarks
Average Score
58.7%
Best Score
86.2%
High Performers (80%+)
4Performance Metrics
Max Context Window
256.0KAvg Throughput
23.0 tok/sAvg Latency
1msTop Categories
reasoning
74.2%
code
66.2%
factuality
64.0%
math
60.9%
general
55.4%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
GSM8k
Rank #33 of 46
#30Jamba 1.5 Large
87.0%
#31Phi 4 Mini
88.6%
#32Phi-3.5-MoE-instruct
88.7%
#33Phi-3.5-mini-instruct
86.2%
#34Gemini 1.5 Flash
86.2%
#35Qwen2.5-Coder 7B Instruct
83.9%
#36Qwen2 7B Instruct
82.3%
ARC-C
Rank #13 of 31
#10Jamba 1.5 Mini
85.7%
#11Claude 3 Haiku
89.2%
#12Nova Micro
90.2%
#13Phi-3.5-mini-instruct
84.6%
#14Phi 4 Mini
83.7%
#15Llama 3.1 8B Instruct
83.4%
#16Llama 3.2 3B Instruct
78.6%
RULER
Rank #2 of 2
#1Phi-3.5-MoE-instruct
87.1%
#2Phi-3.5-mini-instruct
84.1%
PIQA
Rank #4 of 9
#1Gemma 2 9B
81.7%
#2Gemma 2 27B
83.2%
#3Phi-3.5-MoE-instruct
88.6%
#4Phi-3.5-mini-instruct
81.0%
#5Gemma 3n E4B Instructed LiteRT Preview
81.0%
#6Gemma 3n E4B
81.0%
#7Gemma 3n E2B Instructed LiteRT (Preview)
78.9%
OpenBookQA
Rank #3 of 4
#1Phi 4 Mini
79.2%
#2Phi-3.5-MoE-instruct
89.6%
#3Phi-3.5-mini-instruct
79.2%
#4Mistral NeMo Instruct
60.6%
All Benchmark Results for Phi-3.5-mini-instruct
Complete list of benchmark scores with detailed information
GSM8k GSM8k benchmark | math | text | 0.86 | 86.2% | Self-reported |
ARC-C ARC-C benchmark | reasoning | text | 0.85 | 84.6% | Self-reported |
RULER RULER benchmark | general | text | 0.84 | 84.1% | Self-reported |
PIQA PIQA benchmark | general | text | 0.81 | 81.0% | Self-reported |
OpenBookQA OpenBookQA benchmark | general | text | 0.79 | 79.2% | Self-reported |
BoolQ BoolQ benchmark | general | text | 0.78 | 78.0% | Self-reported |
RepoQA RepoQA benchmark | general | text | 0.77 | 77.0% | Self-reported |
Social IQa Social IQa benchmark | general | text | 0.75 | 74.7% | Self-reported |
MEGA XStoryCloze MEGA XStoryCloze benchmark | general | text | 0.73 | 73.5% | Self-reported |
MBPP MBPP benchmark | code | text | 69.60 | 69.6% | Self-reported |
Showing 1 to 10 of 31 benchmarks