MATH

math

text

About

MATH benchmark

Evaluation Stats

Total Models63

Organizations11

Verified Results0

Self-Reported61

Benchmark Details

Max Score1

Language

en

Performance Overview

Score distribution and top performers

Score Distribution

63 models

Top Score

97.9%

Average Score

66.7%

High Performers (80%+)

14

Top Organizations

#1DeepSeek

1 model

74.7%

#2OpenAI

9 models

74.3%

#3Amazon

3 models

73.1%

#4Moonshot AI

1 model

70.2%

#5Alibaba

11 models

69.1%

Leaderboard

Top 20 models ranked by performance

1

97.9%

Raw: 0.979

Self-reported

2

96.4%

Raw: 0.964

Self-reported

3

Gemini 2.0 Flash

89.7%

Raw: 0.897

Self-reported

4

89.0%

Raw: 0.89

Self-reported

5

Gemini 2.0 Flash-Lite

86.8%

Raw: 0.868

Self-reported

6

86.5%

Raw: 0.865

Self-reported

7

85.5%

Raw: 0.855

Self-reported

8

84.7%

Raw: 0.847

Self-reported

9

83.8%

Raw: 0.838

Self-reported

10

Qwen2.5 72B Instruct

83.1%

Raw: 0.831

Self-reported

11

Qwen2.5 32B Instruct

83.1%

Raw: 0.831

Self-reported

12

Qwen2.5 VL 32B Instruct

82.2%

Raw: 0.822

Self-reported

13

80.4%

Raw: 0.804

Self-reported

14

Qwen2.5 14B Instruct

80.0%

Raw: 0.8

Self-reported

15

Claude 3.5 Sonnet

78.3%

Raw: 0.783

Self-reported

16

Gemini 1.5 Flash

77.9%

Raw: 0.779

Self-reported

17

Llama 3.3 70B Instruct

77.0%

Raw: 0.77

Self-reported

18

76.6%

Raw: 0.766

Self-reported

19

76.6%

Raw: 0.766

Self-reported

20

by xAI

76.1%

Raw: 0.761

Self-reported

Showing top 20 of 63 models