OpenAI

GPT OSS 120B

Multimodal
Zero-eval

by OpenAI

About

GPT OSS 120B is a multimodal language model developed by OpenAI. It achieves strong performance with an average score of 63.1% across 2 benchmarks. Notable strengths include GPQA (71.5%), MMLU (54.8%). It supports a 161K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents OpenAI's latest advancement in AI technology.

Pricing Range
Input (per 1M)$0.15 -$0.15
Output (per 1M)$0.60 -$0.60
Providers1
Timeline
AnnouncedAug 5, 2025
ReleasedAug 5, 2025
Specifications
Capabilities
Multimodal
License & Family
License
Apache 2.0
Benchmark Performance Overview
Performance metrics and category breakdown

Overall Performance

2 benchmarks
Average Score
63.1%
Best Score
71.5%
High Performers (80%+)
0

Performance Metrics

Max Context Window
161.0K
Avg Throughput
500.0 tok/s
Avg Latency
1ms

Top Categories

general
63.1%
Benchmark Performance
Top benchmark scores with normalized values (0-100%)
Ranking Across Benchmarks
Position relative to other models on each benchmark

GPQA

Rank #25 of 115
#22o1-preview
73.3%
#23DeepSeek R1 Zero
73.3%
#24Gemini 2.0 Flash Thinking
74.2%
#25GPT OSS 120B
71.5%
#26DeepSeek-R1
71.5%
#27GPT-5 nano
71.2%
#28Magistral Medium
70.8%

MMLU

Rank #78 of 78
#75Gemma 3n E2B Instructed
60.1%
#76Gemma 3n E2B Instructed LiteRT (Preview)
60.1%
#77IBM Granite 4.0 Tiny Preview
60.4%
#78GPT OSS 120B
54.8%
All Benchmark Results for GPT OSS 120B
Complete list of benchmark scores with detailed information
GPQA
GPQA benchmark
general
text
0.71
71.5%
Self-reported
MMLU
MMLU benchmark
general
text
0.55
54.8%
Self-reported