Phi-3.5-vision-instruct

Name: Phi-3.5-vision-instruct
Rating: 68.3 (9 reviews)
Author: Microsoft

Multimodal

Zero-eval

#1ScienceQA

#1POPE

#2InterGPS

by Microsoft

About

Phi-3.5-vision-instruct is a multimodal language model developed by Microsoft. It achieves strong performance with an average score of 68.3% across 9 benchmarks. It excels particularly in ScienceQA (91.3%), POPE (86.1%), MMBench (81.9%). The model shows particular specialization in general tasks with an average performance of 75.9%. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Microsoft's latest advancement in AI technology.

Timeline

AnnouncedAug 23, 2024

ReleasedAug 23, 2024

Specifications

Training Tokens500.0B

Capabilities

Multimodal

License & Family

License

MIT

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

9 benchmarks

Average Score

68.3%

Best Score

91.3%

High Performers (80%+)

Top Categories

general

75.9%

vision

57.5%

math

43.9%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

ScienceQA

Rank #1 of 1

#1Phi-3.5-vision-instruct

91.3%

POPE

Rank #1 of 2

#1Phi-3.5-vision-instruct

86.1%

#2Phi-4-multimodal-instruct

85.6%

MMBench

Rank #4 of 7

#1Qwen2.5 VL 7B Instruct

84.3%

#2Phi-4-multimodal-instruct

86.7%

#3Qwen2.5 VL 72B Instruct

88.0%

#4Phi-3.5-vision-instruct

81.9%

#5DeepSeek VL2 Small

80.3%

#6DeepSeek VL2

79.6%

#7DeepSeek VL2 Tiny

69.2%

ChartQA

Rank #18 of 24

#15Pixtral-12B

81.8%

#16Llama 3.2 11B Instruct

83.4%

#17DeepSeek VL2 Small

84.5%

#18Phi-3.5-vision-instruct

81.8%

#19Phi-4-multimodal-instruct

81.4%

#20DeepSeek VL2 Tiny

81.0%

#21Gemma 3 27B

78.0%

AI2D

Rank #15 of 17

#12DeepSeek VL2 Small

80.0%

#13DeepSeek VL2

81.4%

#14Phi-4-multimodal-instruct

82.3%

#15Phi-3.5-vision-instruct

78.1%

#16Gemma 3 4B

74.8%

#17DeepSeek VL2 Tiny

71.6%

All Benchmark Results for Phi-3.5-vision-instruct

Complete list of benchmark scores with detailed information


ScienceQA ScienceQA benchmark	general	text	0.91	91.3%	Self-reported
POPE POPE benchmark	general	text	0.86	86.1%	Self-reported
MMBench MMBench benchmark	general	text	0.82	81.9%	Self-reported
ChartQA ChartQA benchmark	general	multimodal	0.82	81.8%	Self-reported
AI2D AI2D benchmark	general	text	0.78	78.1%	Self-reported
TextVQA TextVQA benchmark	vision	multimodal	0.72	72.0%	Self-reported
MathVista MathVista benchmark	math	text	0.44	43.9%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.43	43.0%	Self-reported
InterGPS InterGPS benchmark	general	text	0.36	36.3%	Self-reported

Resources

API Reference Research Paper Blog Post Model Weights