OpenAI

GPT-4o mini

Multimodal
Zero-eval

by OpenAI

About

GPT-4o mini is a multimodal language model developed by OpenAI. It achieves strong performance with an average score of 63.5% across 9 benchmarks. It excels particularly in HumanEval (87.2%), MGSM (87.0%), MMLU (82.0%). It supports a 144K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents OpenAI's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.15 -$0.15
Output (per 1M)$0.60 -$0.60
Providers1
Timeline
AnnouncedJul 18, 2024
ReleasedJul 18, 2024
Knowledge CutoffOct 1, 2023
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

9 benchmarks
Average Score
63.5%
Best Score
87.2%
High Performers (80%+)
3

Performance Metrics

Max Context Window
144.4K
Avg Throughput
92.0 tok/s
Avg Latency
1ms

Top Categories

code
87.2%
math
71.3%
vision
59.4%
general
52.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

HumanEval

Rank #25 of 62
#22Gemma 3 27B
87.8%
#23GPT-4.5
88.0%
#24Claude 3.5 Haiku
88.1%
#25GPT-4o mini
87.2%
#26GPT-4 Turbo
87.1%
#27Qwen2.5 72B Instruct
86.6%
#28Qwen2 72B Instruct
86.0%

MGSM

Rank #13 of 31
#10Gemini 1.5 Pro
87.5%
#11GPT-4 Turbo
88.5%
#12o1
89.3%
#13GPT-4o mini
87.0%
#14Llama 3.2 90B Instruct
86.9%
#15Claude 3.5 Haiku
85.6%
#16Qwen3 235B A22B
83.5%

MMLU

Rank #35 of 78
#32Qwen2 72B Instruct
82.3%
#33Qwen2.5 32B Instruct
83.3%
#34Llama 3.1 70B Instruct
83.6%
#35GPT-4o mini
82.0%
#36Grok-1.5
81.3%
#37Jamba 1.5 Large
81.2%
#38Mistral Small 3.1 24B Base
81.0%

DROP

Rank #13 of 28
#10Nova Lite
80.2%
#11GPT-4
80.9%
#12Claude 3 Opus
83.1%
#13GPT-4o mini
79.7%
#14Llama 3.1 70B Instruct
79.6%
#15Nova Micro
79.3%
#16Claude 3 Sonnet
78.9%

MATH

Rank #32 of 63
#29Mistral Small 3 24B Instruct
70.6%
#30Claude 3.5 Sonnet
71.1%
#31Qwen2.5-Omni-7B
71.5%
#32GPT-4o mini
70.2%
#33Kimi K2 Base
70.2%
#34Mistral Small 3.2 24B Instruct
69.4%
#35Claude 3.5 Haiku
69.4%
All Benchmark Results for GPT-4o mini
Complete list of benchmark scores with detailed information
HumanEval
HumanEval benchmark
code
text
0.87
87.2%
Self-reported
MGSM
MGSM benchmark
math
text
0.87
87.0%
Self-reported
MMLU
MMLU benchmark
general
text
0.82
82.0%
Self-reported
DROP
DROP benchmark
general
text
0.80
79.7%
Self-reported
MATH
MATH benchmark
math
text
0.70
70.2%
Self-reported
MMMU
MMMU benchmark
vision
multimodal
0.59
59.4%
Self-reported
MathVista
MathVista benchmark
math
text
0.57
56.7%
Self-reported
GPQA
GPQA benchmark
general
text
0.40
40.2%
Self-reported
SWE-Bench Verified
SWE-Bench Verified benchmark
general
text
0.09
8.7%
Self-reported