OpenAI

GPT-3.5 Turbo

Zero-eval

by OpenAI

About

GPT-3.5 Turbo is a language model developed by OpenAI. The model shows competitive results across 8 benchmarks. Notable strengths include DROP (70.2%), MMLU (69.8%), HumanEval (68.0%). The model is available through 2 API providers.

Pricing Range
Input (per 1M)$0.50 -$0.50
Output (per 1M)$1.50 -$1.50
Providers2
Timeline
AnnouncedMar 21, 2023
ReleasedMar 21, 2023
Knowledge CutoffSep 30, 2021
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

8 benchmarks
Average Score
42.3%
Best Score
70.2%
High Performers (80%+)
0

Performance Metrics

Max Context Window
20.5K
Avg Throughput
95.0 tok/s
Avg Latency
1ms

Top Categories

code
68.0%
general
56.9%
math
33.1%
vision
0.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

DROP

Rank #20 of 28
#17Gemini 1.5 Pro
74.9%
#18Phi 4
75.5%
#19Claude 3 Haiku
78.4%
#20GPT-3.5 Turbo
70.2%
#21Gemma 3n E4B
60.8%
#22Gemma 3n E4B Instructed LiteRT Preview
60.8%
#23Llama 3.1 8B Instruct
59.5%

MMLU

Rank #61 of 78
#58Qwen2 7B Instruct
70.5%
#59Gemma 2 9B
71.3%
#60Gemini 1.0 Pro
71.8%
#61GPT-3.5 Turbo
69.8%
#62Jamba 1.5 Mini
69.7%
#63Llama 3.1 8B Instruct
69.4%
#64Pixtral-12B
69.2%

HumanEval

Rank #54 of 62
#51Phi-3.5-MoE-instruct
70.7%
#52Gemma 3 4B
71.3%
#53Pixtral-12B
72.0%
#54GPT-3.5 Turbo
68.0%
#55GPT-4
67.0%
#56Gemma 3n E2B Instructed LiteRT (Preview)
66.5%
#57Gemma 3n E2B Instructed
66.5%

MGSM

Rank #28 of 31
#25Llama 3.2 3B Instruct
58.2%
#26Phi-3.5-MoE-instruct
58.7%
#27Gemma 3n E4B Instructed LiteRT Preview
60.7%
#28GPT-3.5 Turbo
56.3%
#29Gemma 3n E2B Instructed LiteRT (Preview)
53.1%
#30Gemma 3n E2B Instructed
53.1%
#31Phi-3.5-mini-instruct
47.9%

MATH

Rank #57 of 63
#54Mistral Small 3 24B Base
46.0%
#55Qwen2.5-Coder 7B Instruct
46.6%
#56Gemma 3 1B
48.0%
#57GPT-3.5 Turbo
43.1%
#58Claude 3 Sonnet
43.1%
#59Gemma 2 27B
42.3%
#60GPT-4
42.0%
All Benchmark Results for GPT-3.5 Turbo
Complete list of benchmark scores with detailed information
DROP
DROP benchmark
general
text
0.70
70.2%
Unverified
MMLU
MMLU benchmark
general
text
0.70
69.8%
Unverified
HumanEval
HumanEval benchmark
code
text
0.68
68.0%
Unverified
MGSM
MGSM benchmark
math
text
0.56
56.3%
Unverified
MATH
MATH benchmark
math
text
0.43
43.1%
Unverified
GPQA
GPQA benchmark
general
text
0.31
30.8%
Unverified
MMMU
MMMU benchmark
vision
multimodal
0.00
0.0%
Unverified
MathVista
MathVista benchmark
math
text
0.00
0.0%
Unverified