GPQA

general
text
About

GPQA benchmark

Evaluation Stats
Total Models100
Organizations13
Verified Results0
Self-Reported100
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

100 models
Top Score
88.4%
Average Score
59.0%
High Performers (80%+)
13

Top Organizations

#1xAI
7 models
69.6%
#2OpenAI
20 models
67.1%
#3NVIDIA
3 models
65.6%
#4Moonshot AI
2 models
61.6%
#5DeepSeek
11 models
61.1%
Leaderboard
Top 20 models ranked by performance
88.4%
Raw: 0.884
Self-reported
87.5%
Raw: 0.875
Self-reported
86.4%
Raw: 0.864
Self-reported
85.7%
Raw: 0.857
Self-reported
84.8%
Raw: 0.848
Self-reported
84.6%
Raw: 0.846
Self-reported
84.0%
Raw: 0.84
Self-reported
83.3%
Raw: 0.833
Self-reported
83.0%
Raw: 0.83
Self-reported
82.8%
Raw: 0.828
Self-reported
82.3%
Raw: 0.823
Self-reported
81.4%
Raw: 0.814
Self-reported
81.0%
Raw: 0.81
Self-reported
79.6%
Raw: 0.796
Self-reported
79.0%
Raw: 0.79
Self-reported
78.0%
Raw: 0.78
Self-reported
77.5%
Raw: 0.775
Self-reported
77.2%
Raw: 0.772
Self-reported
76.0%
Raw: 0.7601
Self-reported
75.4%
Raw: 0.754
Self-reported
Showing top 20 of 100 models