AlpacaEval 2.0
code
text
About
AlpacaEval 2.0 benchmark
Evaluation Stats
Total Models5
Organizations2
Verified Results0
Self-Reported5
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
5 models
Top Score
87.6%
Average Score
59.7%
High Performers (80%+)
1Top Organizations
#1DeepSeek
2 models
69.0%
#2IBM
3 models
53.5%
Leaderboard
Top 5 models ranked by performance
87.6%
Raw: 0.876
Self-reported
62.7%
Raw: 0.6268
Self-reported
62.7%
Raw: 0.6268
Self-reported
50.5%
Raw: 0.505
Self-reported
35.2%
Raw: 0.3516
Self-reported