GPT-3.5 Turbo
Zero-eval
by OpenAI
About
GPT-3.5 Turbo is a language model developed by OpenAI. The model shows competitive results across 8 benchmarks. Notable strengths include DROP (70.2%), MMLU (69.8%), HumanEval (68.0%). The model is available through 2 API providers.
Pricing Range
Input (per 1M)$0.50 -$0.50
Output (per 1M)$1.50 -$1.50
Providers2
Timeline
AnnouncedMar 21, 2023
ReleasedMar 21, 2023
Knowledge CutoffSep 30, 2021
Specifications
License & Family
License
Proprietary
Benchmark Performance Overview
Performance metrics and category breakdown
Overall Performance
8 benchmarks
Average Score
42.3%
Best Score
70.2%
High Performers (80%+)
0Performance Metrics
Max Context Window
20.5KAvg Throughput
95.0 tok/sAvg Latency
1msTop Categories
code
68.0%
general
56.9%
math
33.1%
vision
0.0%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark
DROP
Rank #20 of 28
#17Gemini 1.5 Pro
74.9%
#18Phi 4
75.5%
#19Claude 3 Haiku
78.4%
#20GPT-3.5 Turbo
70.2%
#21Gemma 3n E4B
60.8%
#22Gemma 3n E4B Instructed LiteRT Preview
60.8%
#23Llama 3.1 8B Instruct
59.5%
MMLU
Rank #61 of 78
#58Qwen2 7B Instruct
70.5%
#59Gemma 2 9B
71.3%
#60Gemini 1.0 Pro
71.8%
#61GPT-3.5 Turbo
69.8%
#62Jamba 1.5 Mini
69.7%
#63Llama 3.1 8B Instruct
69.4%
#64Pixtral-12B
69.2%
HumanEval
Rank #54 of 62
#51Phi-3.5-MoE-instruct
70.7%
#52Gemma 3 4B
71.3%
#53Pixtral-12B
72.0%
#54GPT-3.5 Turbo
68.0%
#55GPT-4
67.0%
#56Gemma 3n E2B Instructed LiteRT (Preview)
66.5%
#57Gemma 3n E2B Instructed
66.5%
MGSM
Rank #28 of 31
#25Llama 3.2 3B Instruct
58.2%
#26Phi-3.5-MoE-instruct
58.7%
#27Gemma 3n E4B Instructed LiteRT Preview
60.7%
#28GPT-3.5 Turbo
56.3%
#29Gemma 3n E2B Instructed LiteRT (Preview)
53.1%
#30Gemma 3n E2B Instructed
53.1%
#31Phi-3.5-mini-instruct
47.9%
MATH
Rank #57 of 63
#54Mistral Small 3 24B Base
46.0%
#55Qwen2.5-Coder 7B Instruct
46.6%
#56Gemma 3 1B
48.0%
#57GPT-3.5 Turbo
43.1%
#58Claude 3 Sonnet
43.1%
#59Gemma 2 27B
42.3%
#60GPT-4
42.0%
All Benchmark Results for GPT-3.5 Turbo
Complete list of benchmark scores with detailed information
DROP DROP benchmark | general | text | 0.70 | 70.2% | Unverified |
MMLU MMLU benchmark | general | text | 0.70 | 69.8% | Unverified |
HumanEval HumanEval benchmark | code | text | 0.68 | 68.0% | Unverified |
MGSM MGSM benchmark | math | text | 0.56 | 56.3% | Unverified |
MATH MATH benchmark | math | text | 0.43 | 43.1% | Unverified |
GPQA GPQA benchmark | general | text | 0.31 | 30.8% | Unverified |
MMMU MMMU benchmark | vision | multimodal | 0.00 | 0.0% | Unverified |
MathVista MathVista benchmark | math | text | 0.00 | 0.0% | Unverified |
Resources