🚀 Website under development • Launching soon

GPQA-Diamond

reasoning
text
About

GPQA-Diamond is the diamond (hardest) subset of the Graduate-Level Google-Proof Q&A benchmark, testing expert-level reasoning in physics, biology, and chemistry

Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

2 models
Top Score
81.0%
Average Score
78.0%
High Performers (80%+)
1

Top Organizations

#1DeepSeek
2 models
78.0%
Leaderboard
2 models ranked by performance on GPQA-Diamond
LicenseLinks
May 28, 2025
MIT
81.0%
Jan 10, 2025
MIT
74.9%