C-Eval

code

text

About

C-Eval benchmark

Evaluation Stats

Total Models6

Organizations3

Verified Results0

Self-Reported6

Benchmark Details

Max Score1

Language

en

Performance Overview

Score distribution and top performers

Score Distribution

6 models

Top Score

92.5%

Average Score

86.7%

High Performers (80%+)

5

Top Organizations

#1Moonshot AI

2 models

90.4%

#2DeepSeek

2 models

89.1%

#3Alibaba

2 models

80.5%

Leaderboard

Top 6 models ranked by performance

1

92.5%

Raw: 0.925

Self-reported

2

91.8%

Raw: 0.918

Self-reported

3

88.3%

Raw: 0.883

Self-reported

4

86.5%

Raw: 0.865

Self-reported

5

Qwen2 72B Instruct

83.8%

Raw: 0.838

Self-reported

6

Qwen2 7B Instruct

77.2%

Raw: 0.772

Self-reported