Arena Hard
general
text
About
Arena Hard benchmark
Evaluation Stats
Total Models22
Organizations7
Verified Results0
Self-Reported22
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
22 models
Top Score
95.6%
Average Score
66.4%
High Performers (80%+)
7Top Organizations
#1NVIDIA
1 model
88.3%
#2DeepSeek
2 models
84.3%
#3Alibaba
5 models
82.7%
#4Mistral AI
3 models
67.2%
#5Microsoft
6 models
55.9%
Leaderboard
Top 20 models ranked by performance
95.6%
Raw: 0.956
Self-reported
92.3%
Raw: 0.923
Self-reported
91.0%
Raw: 0.91
Self-reported
88.3%
Raw: 0.883
Self-reported
87.6%
Raw: 0.876
Self-reported
81.2%
Raw: 0.812
Self-reported
79.0%
Raw: 0.79
Self-reported
76.2%
Raw: 0.762
Self-reported
73.3%
Raw: 0.733
Self-reported
70.9%
Raw: 0.709
Self-reported
65.4%
Raw: 0.654
Self-reported
57.6%
Raw: 0.5756
Self-reported
57.6%
Raw: 0.5756
Self-reported
52.0%
Raw: 0.52
Self-reported
46.1%
Raw: 0.461
Self-reported
43.1%
Raw: 0.431
Self-reported
37.9%
Raw: 0.379
Self-reported
37.0%
Raw: 0.37
Self-reported
Showing top 20 of 22 models