MGSM

math
text
About

MGSM benchmark

Evaluation Stats
Total Models31
Organizations6
Verified Results0
Self-Reported30
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

31 models
Top Score
92.3%
Average Score
77.9%
High Performers (80%+)
19

Top Organizations

#1Anthropic
6 models
86.4%
#2OpenAI
8 models
83.6%
#3Alibaba
1 model
83.5%
#4Meta
6 models
81.3%
#5Google
6 models
67.3%
Leaderboard
Top 20 models ranked by performance
92.3%
Raw: 0.923
Self-reported
92.0%
Raw: 0.92
Self-reported
91.6%
Raw: 0.916
Self-reported
91.6%
Raw: 0.916
Self-reported
91.1%
Raw: 0.911
Self-reported
90.8%
Raw: 0.908
Self-reported
90.7%
Raw: 0.907
Self-reported
90.6%
Raw: 0.906
Self-reported
90.5%
Raw: 0.905
Self-reported
89.3%
Raw: 0.893
Self-reported
88.5%
Raw: 0.885
Self-reported
87.5%
Raw: 0.875
Self-reported
87.0%
Raw: 0.87
Self-reported
86.9%
Raw: 0.869
Self-reported
85.6%
Raw: 0.856
Self-reported
83.5%
Raw: 0.8353
Self-reported
83.5%
Raw: 0.835
Self-reported
82.6%
Raw: 0.826
Self-reported
80.6%
Raw: 0.806
Self-reported
75.1%
Raw: 0.751
Self-reported
Showing top 20 of 31 models