
GPT-4 Turbo
Zero-eval
by OpenAI
About
GPT-4 Turbo is a language model developed by OpenAI. It achieves strong performance with an average score of 78.1% across 6 benchmarks. It excels particularly in MGSM (88.5%), HumanEval (87.1%), MMLU (86.5%). It supports a 132K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.
Pricing Range
Input (per 1M)$10.00 -$10.00
Output (per 1M)$30.00 -$30.00
Providers2
Timeline
AnnouncedApr 9, 2024
ReleasedApr 9, 2024
Knowledge CutoffDec 31, 2023
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
6 benchmarks
Average Score
78.1%
Best Score
88.5%
High Performers (80%+)
4Performance Metrics
Max Context Window
132.1KAvg Throughput
98.5 tok/sAvg Latency
1msTop Categories
code
87.1%
math
80.5%
general
73.5%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
MGSM
Rank #11 of 31
#8o1
89.3%
#9GPT-4o
90.5%
#10Llama 4 Scout
90.6%
#11GPT-4 Turbo
88.5%
#12Gemini 1.5 Pro
87.5%
#13GPT-4o mini
87.0%
#14Llama 3.2 90B Instruct
86.9%
HumanEval
Rank #26 of 62
#23GPT-4o mini
87.2%
#24Gemma 3 27B
87.8%
#25GPT-4.5
88.0%
#26GPT-4 Turbo
87.1%
#27Qwen2.5 72B Instruct
86.6%
#28Qwen2 72B Instruct
86.0%
#29Grok-2 mini
85.7%
MMLU
Rank #20 of 78
#17Claude 3 Opus
86.8%
#18o3-mini
86.9%
#19Llama 3.1 405B Instruct
87.3%
#20GPT-4 Turbo
86.5%
#21GPT-4
86.4%
#22Grok-2 mini
86.2%
#23Llama 3.2 90B Instruct
86.0%
DROP
Rank #5 of 28
#2Claude 3.5 Sonnet
87.1%
#3Claude 3.5 Sonnet
87.1%
#4DeepSeek-V3
91.6%
#5GPT-4 Turbo
86.0%
#6Nova Pro
85.4%
#7Llama 3.1 405B Instruct
84.8%
#8GPT-4o
83.4%
MATH
Rank #27 of 63
#24Grok-2 mini
73.0%
#25Nova Lite
73.3%
#26Llama 3.1 405B Instruct
73.8%
#27GPT-4 Turbo
72.6%
#28Qwen3 235B A22B
71.8%
#29Qwen2.5-Omni-7B
71.5%
#30Claude 3.5 Sonnet
71.1%
All Benchmark Results for GPT-4 Turbo
Complete list of benchmark scores with detailed information
MGSM MGSM benchmark | math | text | 0.89 | 88.5% | Self-reported |
HumanEval HumanEval benchmark | code | text | 0.87 | 87.1% | Self-reported |
MMLU MMLU benchmark | general | text | 0.86 | 86.5% | Self-reported |
DROP DROP benchmark | general | text | 0.86 | 86.0% | Self-reported |
MATH MATH benchmark | math | text | 0.73 | 72.6% | Self-reported |
GPQA GPQA benchmark | general | text | 0.48 | 48.0% | Self-reported |
Resources