GSM8k
math
text
About
GSM8k benchmark
Evaluation Stats
Total Models46
Organizations15
Verified Results0
Self-Reported46
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
46 models
Top Score
97.3%
Average Score
87.8%
High Performers (80%+)
38Top Organizations
#1OpenAI
2 models
97.0%
#2DeepSeek
1 model
95.1%
#3Moonshot AI
2 models
94.7%
#4Amazon
3 models
93.9%
#5Anthropic
5 models
93.8%
Leaderboard
Top 20 models ranked by performance
97.3%
Raw: 0.973
Self-reported
96.8%
Raw: 0.968
Self-reported
96.4%
Raw: 0.964
Self-reported
96.4%
Raw: 0.964
Self-reported
95.9%
Raw: 0.959
Self-reported
95.9%
Raw: 0.959
Self-reported
95.8%
Raw: 0.958
Self-reported
95.1%
Raw: 0.951
Self-reported
95.0%
Raw: 0.95
Self-reported
94.8%
Raw: 0.948
Self-reported
15
94.4%
Raw: 0.944
Self-reported
94.4%
Raw: 0.9439
Self-reported
93.0%
Raw: 0.93
Self-reported
92.3%
Raw: 0.923
Self-reported
19
92.3%
Raw: 0.923
Self-reported
92.1%
Raw: 0.921
Self-reported
Showing top 20 of 46 models