HumanEval

code
text
About

HumanEval benchmark

Evaluation Stats
Total Models62
Organizations12
Verified Results0
Self-Reported61
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

62 models
Top Score
93.7%
Average Score
80.4%
High Performers (80%+)
41

Top Organizations

#1Moonshot AI
1 model
93.3%
#2DeepSeek
1 model
89.0%
#3IBM
3 models
87.3%
#4Alibaba
10 models
86.1%
#5Amazon
3 models
85.2%
Leaderboard
Top 20 models ranked by performance
93.7%
Raw: 0.937
Self-reported
93.4%
Raw: 0.934
Self-reported
93.3%
Raw: 0.933
Self-reported
92.7%
Raw: 0.927
Self-reported
92.4%
Raw: 0.924
Self-reported
92.0%
Raw: 0.92
Self-reported
92.0%
Raw: 0.92
Self-reported
91.5%
Raw: 0.915
Self-reported
90.2%
Raw: 0.902
Self-reported
89.7%
Raw: 0.8973
Self-reported
89.7%
Raw: 0.8973
Self-reported
89.6%
Raw: 0.896
Self-reported
89.0%
Raw: 0.89
Self-reported
89.0%
Raw: 0.89
Self-reported
89.0%
Raw: 0.89
Self-reported
88.4%
Raw: 0.8841
Self-reported
88.4%
Raw: 0.884
Self-reported
88.4%
Raw: 0.884
Self-reported
88.4%
Raw: 0.884
Self-reported
88.4%
Raw: 0.884
Self-reported
Showing top 20 of 62 models