OpenAI

GPT-4 Turbo

Zero-eval

by OpenAI

About

GPT-4 Turbo is a language model developed by OpenAI. It achieves strong performance with an average score of 78.1% across 6 benchmarks. It excels particularly in MGSM (88.5%), HumanEval (87.1%), MMLU (86.5%). It supports a 132K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.

Pricing Range
Input (per 1M)$10.00 -$10.00
Output (per 1M)$30.00 -$30.00
Providers2
Timeline
AnnouncedApr 9, 2024
ReleasedApr 9, 2024
Knowledge CutoffDec 31, 2023
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

6 benchmarks
Average Score
78.1%
Best Score
88.5%
High Performers (80%+)
4

Performance Metrics

Max Context Window
132.1K
Avg Throughput
98.5 tok/s
Avg Latency
1ms

Top Categories

code
87.1%
math
80.5%
general
73.5%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

MGSM

Rank #11 of 31
#8o1
89.3%
#9GPT-4o
90.5%
#10Llama 4 Scout
90.6%
#11GPT-4 Turbo
88.5%
#12Gemini 1.5 Pro
87.5%
#13GPT-4o mini
87.0%
#14Llama 3.2 90B Instruct
86.9%

HumanEval

Rank #26 of 62
#23GPT-4o mini
87.2%
#24Gemma 3 27B
87.8%
#25GPT-4.5
88.0%
#26GPT-4 Turbo
87.1%
#27Qwen2.5 72B Instruct
86.6%
#28Qwen2 72B Instruct
86.0%
#29Grok-2 mini
85.7%

MMLU

Rank #20 of 78
#17Claude 3 Opus
86.8%
#18o3-mini
86.9%
#19Llama 3.1 405B Instruct
87.3%
#20GPT-4 Turbo
86.5%
#21GPT-4
86.4%
#22Grok-2 mini
86.2%
#23Llama 3.2 90B Instruct
86.0%

DROP

Rank #5 of 28
#2Claude 3.5 Sonnet
87.1%
#3Claude 3.5 Sonnet
87.1%
#4DeepSeek-V3
91.6%
#5GPT-4 Turbo
86.0%
#6Nova Pro
85.4%
#7Llama 3.1 405B Instruct
84.8%
#8GPT-4o
83.4%

MATH

Rank #27 of 63
#24Grok-2 mini
73.0%
#25Nova Lite
73.3%
#26Llama 3.1 405B Instruct
73.8%
#27GPT-4 Turbo
72.6%
#28Qwen3 235B A22B
71.8%
#29Qwen2.5-Omni-7B
71.5%
#30Claude 3.5 Sonnet
71.1%
All Benchmark Results for GPT-4 Turbo
Complete list of benchmark scores with detailed information
MGSM
MGSM benchmark
math
text
0.89
88.5%
Self-reported
HumanEval
HumanEval benchmark
code
text
0.87
87.1%
Self-reported
MMLU
MMLU benchmark
general
text
0.86
86.5%
Self-reported
DROP
DROP benchmark
general
text
0.86
86.0%
Self-reported
MATH
MATH benchmark
math
text
0.73
72.6%
Self-reported
GPQA
GPQA benchmark
general
text
0.48
48.0%
Self-reported