MMLU-Redux

general
text
About

MMLU-Redux benchmark

Evaluation Stats
Total Models13
Organizations3
Verified Results0
Self-Reported13
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

13 models
Top Score
93.4%
Average Score
83.8%
High Performers (80%+)
9

Top Organizations

#1Moonshot AI
1 model
92.7%
#2DeepSeek
3 models
91.8%
#3Alibaba
9 models
80.2%
Leaderboard
Top 13 models ranked by performance
93.4%
Raw: 0.934
Self-reported
93.1%
Raw: 0.931
Self-reported
92.9%
Raw: 0.929
Self-reported
92.7%
Raw: 0.927
Self-reported
89.1%
Raw: 0.891
Self-reported
87.4%
Raw: 0.874
Self-reported
86.8%
Raw: 0.868
Self-reported
83.9%
Raw: 0.839
Self-reported
80.0%
Raw: 0.8
Self-reported
77.5%
Raw: 0.775
Self-reported
75.4%
Raw: 0.754
Self-reported
71.0%
Raw: 0.71
Self-reported
66.6%
Raw: 0.666
Self-reported