GPQA
general
text
About
GPQA benchmark
Evaluation Stats
Total Models100
Organizations13
Verified Results0
Self-Reported100
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
100 models
Top Score
88.4%
Average Score
59.0%
High Performers (80%+)
13Top Organizations
#1xAI
7 models
69.6%
#2OpenAI
20 models
67.1%
#3NVIDIA
3 models
65.6%
#4Moonshot AI
2 models
61.6%
#5DeepSeek
11 models
61.1%
Leaderboard
Top 20 models ranked by performance
1
88.4%
Raw: 0.884
Self-reported
86.4%
Raw: 0.864
Self-reported
84.8%
Raw: 0.848
Self-reported
7
84.0%
Raw: 0.84
Self-reported
83.0%
Raw: 0.83
Self-reported
82.8%
Raw: 0.828
Self-reported
11
82.3%
Raw: 0.823
Self-reported
81.0%
Raw: 0.81
Self-reported
79.6%
Raw: 0.796
Self-reported
77.5%
Raw: 0.775
Self-reported
76.0%
Raw: 0.7601
Self-reported
75.4%
Raw: 0.754
Self-reported
Showing top 20 of 100 models