
GPT-4o mini
Multimodal
Zero-eval
by OpenAI
About
GPT-4o mini is a multimodal language model developed by OpenAI. It achieves strong performance with an average score of 63.5% across 9 benchmarks. It excels particularly in HumanEval (87.2%), MGSM (87.0%), MMLU (82.0%). It supports a 144K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents OpenAI's latest advancement in AI technology.
Pricing Range
Input (per 1M)$0.15 -$0.15
Output (per 1M)$0.60 -$0.60
Providers1
Timeline
AnnouncedJul 18, 2024
ReleasedJul 18, 2024
Knowledge CutoffOct 1, 2023
Specifications
Capabilities
Multimodal
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
9 benchmarks
Average Score
63.5%
Best Score
87.2%
High Performers (80%+)
3Performance Metrics
Max Context Window
144.4KAvg Throughput
92.0 tok/sAvg Latency
1msTop Categories
code
87.2%
math
71.3%
vision
59.4%
general
52.7%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
HumanEval
Rank #25 of 62
#22Gemma 3 27B
87.8%
#23GPT-4.5
88.0%
#24Claude 3.5 Haiku
88.1%
#25GPT-4o mini
87.2%
#26GPT-4 Turbo
87.1%
#27Qwen2.5 72B Instruct
86.6%
#28Qwen2 72B Instruct
86.0%
MGSM
Rank #13 of 31
#10Gemini 1.5 Pro
87.5%
#11GPT-4 Turbo
88.5%
#12o1
89.3%
#13GPT-4o mini
87.0%
#14Llama 3.2 90B Instruct
86.9%
#15Claude 3.5 Haiku
85.6%
#16Qwen3 235B A22B
83.5%
MMLU
Rank #35 of 78
#32Qwen2 72B Instruct
82.3%
#33Qwen2.5 32B Instruct
83.3%
#34Llama 3.1 70B Instruct
83.6%
#35GPT-4o mini
82.0%
#36Grok-1.5
81.3%
#37Jamba 1.5 Large
81.2%
#38Mistral Small 3.1 24B Base
81.0%
DROP
Rank #13 of 28
#10Nova Lite
80.2%
#11GPT-4
80.9%
#12Claude 3 Opus
83.1%
#13GPT-4o mini
79.7%
#14Llama 3.1 70B Instruct
79.6%
#15Nova Micro
79.3%
#16Claude 3 Sonnet
78.9%
MATH
Rank #32 of 63
#29Mistral Small 3 24B Instruct
70.6%
#30Claude 3.5 Sonnet
71.1%
#31Qwen2.5-Omni-7B
71.5%
#32GPT-4o mini
70.2%
#33Kimi K2 Base
70.2%
#34Mistral Small 3.2 24B Instruct
69.4%
#35Claude 3.5 Haiku
69.4%
All Benchmark Results for GPT-4o mini
Complete list of benchmark scores with detailed information
HumanEval HumanEval benchmark | code | text | 0.87 | 87.2% | Self-reported |
MGSM MGSM benchmark | math | text | 0.87 | 87.0% | Self-reported |
MMLU MMLU benchmark | general | text | 0.82 | 82.0% | Self-reported |
DROP DROP benchmark | general | text | 0.80 | 79.7% | Self-reported |
MATH MATH benchmark | math | text | 0.70 | 70.2% | Self-reported |
MMMU MMMU benchmark | vision | multimodal | 0.59 | 59.4% | Self-reported |
MathVista MathVista benchmark | math | text | 0.57 | 56.7% | Self-reported |
GPQA GPQA benchmark | general | text | 0.40 | 40.2% | Self-reported |
SWE-Bench Verified SWE-Bench Verified benchmark | general | text | 0.09 | 8.7% | Self-reported |
Resources