TheoremQA
general
text
About
TheoremQA benchmark
Evaluation Stats
Total Models6
Organizations1
Verified Results0
Self-Reported6
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
6 models
Top Score
44.4%
Average Score
39.0%
High Performers (80%+)
0Top Organizations
#1Alibaba
6 models
39.0%
Leaderboard
Top 6 models ranked by performance
44.4%
Raw: 0.444
Self-reported
44.1%
Raw: 0.441
Self-reported
43.1%
Raw: 0.431
Self-reported
43.0%
Raw: 0.43
Self-reported
34.0%
Raw: 0.34
Self-reported
25.3%
Raw: 0.253
Self-reported