MMMLU

general

text

About

MMMLU benchmark

Evaluation Stats

Total Models13

Organizations4

Verified Results0

Self-Reported13

Benchmark Details

Max Score1

Language

en

Performance Overview

Score distribution and top performers

Score Distribution

13 models

Top Score

98.4%

Average Score

81.4%

High Performers (80%+)

9

Top Organizations

#1Anthropic

4 models

90.0%

#2Alibaba

1 model

86.7%

#3OpenAI

6 models

81.2%

#4Microsoft

2 models

62.7%

Leaderboard

Top 13 models ranked by performance

1

Claude Opus 4.1

98.4%

Raw: 0.984

Self-reported

2

88.8%

Raw: 0.888

Self-reported

3

87.7%

Raw: 0.877

Self-reported

4

87.3%

Raw: 0.873

Self-reported

5

Qwen3 235B A22B

86.7%

Raw: 0.867

Self-reported

6

Claude Sonnet 4

86.5%

Raw: 0.865

Self-reported

7

Claude 3.7 Sonnet

86.1%

Raw: 0.861

Self-reported

8

85.1%

Raw: 0.851

Self-reported

9

81.4%

Raw: 0.814

Self-reported

10

78.5%

Raw: 0.785

Self-reported

11

Phi-3.5-MoE-instruct

69.9%

Raw: 0.699

Self-reported

12

66.9%

Raw: 0.669

Self-reported

13

Phi-3.5-mini-instruct

55.4%

Raw: 0.554

Self-reported