GPT-4o

Name: GPT-4o
Price: 2.5 USD
Rating: 55.1 (32 reviews)
Author: OpenAI

Multimodal

Zero-eval

#1ComplexFuncBench

#1ActivityNet

#2AI2D

+3 more

by OpenAI

About

GPT-4o is a multimodal language model developed by OpenAI. The model shows competitive results across 32 benchmarks. It excels particularly in AI2D (94.2%), DocVQA (92.8%), ChartQA (85.7%). The model shows particular specialization in vision tasks with an average performance of 82.5%. It supports a 144K token context window for handling large documents. The model is available through 3 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents OpenAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$2.50 -$2.50

Output (per 1M)$10.00 -$10.00

Providers3

Timeline

AnnouncedAug 6, 2024

ReleasedAug 6, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

32 benchmarks

Average Score

55.1%

Best Score

94.2%

High Performers (80%+)

Performance Metrics

Max Context Window

144.4K

Avg Throughput

110.0 tok/s

Avg Latency

1ms

Top Categories

vision

82.5%

code

81.0%

math

61.4%

general

52.1%

agents

51.5%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

AI2D

Rank #2 of 17

#1Claude 3.5 Sonnet

94.7%

#2GPT-4o

94.2%

#3Pixtral Large

93.8%

#4Mistral Small 3.2 24B Instruct

92.9%

#5Llama 3.2 90B Instruct

92.3%

DocVQA

Rank #15 of 26

#12Phi-4-multimodal-instruct

93.2%

#13Grok-2 mini

93.2%

#14Pixtral Large

93.3%

#15GPT-4o

92.8%

#16Nova Lite

92.4%

#17DeepSeek VL2 Small

92.3%

#18Pixtral-12B

90.7%

ChartQA

Rank #12 of 24

#9DeepSeek VL2

86.0%

#10Nova Lite

86.8%

#11Qwen2.5 VL 7B Instruct

87.3%

#12GPT-4o

85.7%

#13Llama 3.2 90B Instruct

85.5%

#14Qwen2.5-Omni-7B

85.3%

#15DeepSeek VL2 Small

84.5%

MMLU

Rank #27 of 78

#24Nova Pro

85.9%

#25Gemini 1.5 Pro

85.9%

#26Llama 3.3 70B Instruct

86.0%

#27GPT-4o

85.7%

#28Llama 4 Maverick

85.5%

#29o1-mini

85.2%

#30Phi 4

84.8%

CharXiv-D

Rank #4 of 5

#1GPT-4.1

87.9%

#2GPT-4.1 mini

88.4%

#3GPT-4.5

90.0%

#4GPT-4o

85.3%

#5GPT-4.1 nano

73.9%

All Benchmark Results for GPT-4o

Complete list of benchmark scores with detailed information


AI2D AI2D benchmark	general	text	0.94	94.2%	Self-reported
DocVQA DocVQA benchmark	vision	multimodal	0.93	92.8%	Self-reported
ChartQA ChartQA benchmark	general	multimodal	0.86	85.7%	Self-reported
MMLU MMLU benchmark	general	text	0.86	85.7%	Self-reported
CharXiv-D CharXiv-D benchmark	general	text	0.85	85.3%	Self-reported
MMMLU MMMLU benchmark	general	text	0.81	81.4%	Self-reported
IFEval IFEval benchmark	code	text	0.81	81.0%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.75	74.7%	Self-reported
MMMU MMMU benchmark	vision	multimodal	0.72	72.2%	Self-reported
EgoSchema EgoSchema benchmark	general	text	0.72	72.2%	Self-reported

Showing 1 to 10 of 32 benchmarks

Resources

API Reference Playground Blog Post