GPQA-Diamond
reasoning
text
About
GPQA-Diamond is the diamond (hardest) subset of the Graduate-Level Google-Proof Q&A benchmark, testing expert-level reasoning in physics, biology, and chemistry
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
2 models
Top Score
81.0%
Average Score
78.0%
High Performers (80%+)
1Top Organizations
#1DeepSeek
2 models
78.0%
Leaderboard
2 models ranked by performance on GPQA-Diamond
License | Links | ||||
---|---|---|---|---|---|
May 28, 2025 | MIT | 81.0% | |||
Jan 10, 2025 | MIT | 74.9% |