GPT-4o

Name: GPT-4o
Price: 2.5 USD
Rating: 77.4 (8 reviews)
Author: OpenAI

Multimodal

Zero-eval

by OpenAI

About

GPT-4o is a multimodal language model developed by OpenAI. It achieves strong performance with an average score of 77.4% across 8 benchmarks. It excels particularly in MGSM (90.5%), HumanEval (90.2%), MMLU (88.7%). It supports a 132K token context window for handling large documents. The model is available through 2 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents OpenAI's latest advancement in AI technology.

Pricing Range

Input (per 1M)$2.50 -$2.50

Output (per 1M)$10.00 -$10.00

Providers2

Timeline

AnnouncedMay 13, 2024

ReleasedMay 13, 2024

Specifications

Capabilities

Multimodal

License & Family

License

Proprietary

Benchmark Performance Overview

Performance metrics and category breakdown

Overall Performance

8 benchmarks

Average Score

77.4%

Best Score

90.5%

High Performers (80%+)

Performance Metrics

Max Context Window

132.1K

Avg Throughput

96.0 tok/s

Avg Latency

1ms

Top Categories

code

90.2%

math

77.0%

general

74.6%

Benchmark Performance

Top benchmark scores with normalized values (0-100%)

Ranking Across Benchmarks

Position relative to other models on each benchmark

MGSM

Rank #9 of 31

#6Llama 4 Scout

90.6%

#7Claude 3 Opus

90.7%

#8o1-preview

90.8%

#9GPT-4o

90.5%

#10o1

89.3%

#11GPT-4 Turbo

88.5%

#12Gemini 1.5 Pro

87.5%

HumanEval

Rank #9 of 62

#6Qwen2.5 VL 32B Instruct

91.5%

#7Mistral Large 2

92.0%

#8Claude 3.5 Sonnet

92.0%

#9GPT-4o

90.2%

#10Granite 3.3 8B Instruct

89.7%

#11Granite 3.3 8B Base

89.7%

#12Gemini Diffusion

89.6%

MMLU

Rank #10 of 78

#7Kimi K2 Instruct

89.5%

#8GPT-4.1

90.2%

#9Claude 3.5 Sonnet

90.4%

#10GPT-4o

88.7%

#11DeepSeek-V3

88.5%

#12Qwen3 235B A22B

87.8%

#13Kimi K2 Base

87.8%

DROP

Rank #8 of 28

#5Llama 3.1 405B Instruct

84.8%

#6Nova Pro

85.4%

#7GPT-4 Turbo

86.0%

#8GPT-4o

83.4%

#9Claude 3.5 Haiku

83.1%

#10Claude 3 Opus

83.1%

#11GPT-4

80.9%

MATH

Rank #19 of 63

#16Nova Pro

76.6%

#17Llama 3.3 70B Instruct

77.0%

#18Gemini 1.5 Flash

77.9%

#19GPT-4o

76.6%

#20Grok-2

76.1%

#21Gemma 3 4B

75.6%

#22Qwen2.5 7B Instruct

75.5%

All Benchmark Results for GPT-4o

Complete list of benchmark scores with detailed information


MGSM MGSM benchmark	math	text	0.91	90.5%	Self-reported
HumanEval HumanEval benchmark	code	text	0.90	90.2%	Self-reported
MMLU MMLU benchmark	general	text	0.89	88.7%	Self-reported
DROP DROP benchmark	general	text	0.83	83.4%	Self-reported
MATH MATH benchmark	math	text	0.77	76.6%	Self-reported
MMLU-Pro MMLU-Pro benchmark	general	text	0.73	72.6%	Self-reported
MathVista MathVista benchmark	math	text	0.64	63.8%	Self-reported
GPQA GPQA benchmark	general	text	0.54	53.6%	Self-reported

Resources

API Reference Playground Blog Post