MMMLU

general
text
About

MMMLU benchmark

Evaluation Stats
Total Models13
Organizations4
Verified Results0
Self-Reported13
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

13 models
Top Score
98.4%
Average Score
81.4%
High Performers (80%+)
9

Top Organizations

#1Anthropic
4 models
90.0%
#2Alibaba
1 model
86.7%
#3OpenAI
6 models
81.2%
#4Microsoft
2 models
62.7%
Leaderboard
Top 13 models ranked by performance
98.4%
Raw: 0.984
Self-reported
88.8%
Raw: 0.888
Self-reported
87.7%
Raw: 0.877
Self-reported
87.3%
Raw: 0.873
Self-reported
86.7%
Raw: 0.867
Self-reported
86.5%
Raw: 0.865
Self-reported
86.1%
Raw: 0.861
Self-reported
85.1%
Raw: 0.851
Self-reported
81.4%
Raw: 0.814
Self-reported
78.5%
Raw: 0.785
Self-reported
69.9%
Raw: 0.699
Self-reported
66.9%
Raw: 0.669
Self-reported
55.4%
Raw: 0.554
Self-reported