EvalPlus
code
text
About
EvalPlus benchmark
Evaluation Stats
Total Models4
Organizations2
Verified Results0
Self-Reported4
Benchmark Details
Max Score100
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
4 models
Top Score
80.3%
Average Score
76.8%
High Performers (80%+)
1Top Organizations
#1Moonshot AI
1 model
80.3%
#2Alibaba
3 models
75.6%
Leaderboard
Top 4 models ranked by performance
80.3%
Raw: 80.30000000000001
Self-reported
79.0%
Raw: 79
Self-reported
77.6%
Raw: 77.6
Self-reported
70.3%
Raw: 70.3
Self-reported