
Phi-3.5-vision-instruct
Multimodal
Zero-eval
#1ScienceQA
#1POPE
#2InterGPS
by Microsoft
About
Phi-3.5-vision-instruct is a multimodal language model developed by Microsoft. It achieves strong performance with an average score of 68.3% across 9 benchmarks. It excels particularly in ScienceQA (91.3%), POPE (86.1%), MMBench (81.9%). The model shows particular specialization in general tasks with an average performance of 75.9%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Microsoft's latest advancement in AI technology.
Timeline
AnnouncedAug 23, 2024
ReleasedAug 23, 2024
Specifications
Training Tokens500.0B
Capabilities
Multimodal
License & Family
License
MIT
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
9 benchmarks
Average Score
68.3%
Best Score
91.3%
High Performers (80%+)
4Top Categories
general
75.9%
vision
57.5%
math
43.9%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
ScienceQA
Rank #1 of 1
#1Phi-3.5-vision-instruct
91.3%
POPE
Rank #1 of 2
#1Phi-3.5-vision-instruct
86.1%
#2Phi-4-multimodal-instruct
85.6%
MMBench
Rank #4 of 7
#1Qwen2.5 VL 7B Instruct
84.3%
#2Phi-4-multimodal-instruct
86.7%
#3Qwen2.5 VL 72B Instruct
88.0%
#4Phi-3.5-vision-instruct
81.9%
#5DeepSeek VL2 Small
80.3%
#6DeepSeek VL2
79.6%
#7DeepSeek VL2 Tiny
69.2%
ChartQA
Rank #18 of 24
#15Pixtral-12B
81.8%
#16Llama 3.2 11B Instruct
83.4%
#17DeepSeek VL2 Small
84.5%
#18Phi-3.5-vision-instruct
81.8%
#19Phi-4-multimodal-instruct
81.4%
#20DeepSeek VL2 Tiny
81.0%
#21Gemma 3 27B
78.0%
AI2D
Rank #15 of 17
#12DeepSeek VL2 Small
80.0%
#13DeepSeek VL2
81.4%
#14Phi-4-multimodal-instruct
82.3%
#15Phi-3.5-vision-instruct
78.1%
#16Gemma 3 4B
74.8%
#17DeepSeek VL2 Tiny
71.6%
All Benchmark Results for Phi-3.5-vision-instruct
Complete list of benchmark scores with detailed information
ScienceQA ScienceQA benchmark | general | text | 0.91 | 91.3% | Self-reported |
POPE POPE benchmark | general | text | 0.86 | 86.1% | Self-reported |
MMBench MMBench benchmark | general | text | 0.82 | 81.9% | Self-reported |
ChartQA ChartQA benchmark | general | multimodal | 0.82 | 81.8% | Self-reported |
AI2D AI2D benchmark | general | text | 0.78 | 78.1% | Self-reported |
TextVQA TextVQA benchmark | vision | multimodal | 0.72 | 72.0% | Self-reported |
MathVista MathVista benchmark | math | text | 0.44 | 43.9% | Self-reported |
MMMU MMMU benchmark | vision | multimodal | 0.43 | 43.0% | Self-reported |
InterGPS InterGPS benchmark | general | text | 0.36 | 36.3% | Self-reported |