HellaSwag

reasoning
text
About

HellaSwag benchmark

Evaluation Stats
Total Models24
Organizations10
Verified Results0
Self-Reported24
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

24 models
Top Score
95.4%
Average Score
82.4%
High Performers (80%+)
16

Top Organizations

#1OpenAI
1 model
95.3%
#2Anthropic
3 models
90.1%
#3Cohere
1 model
88.6%
#4NVIDIA
1 model
85.6%
#5Mistral AI
1 model
83.5%
Leaderboard
Top 20 models ranked by performance
95.4%
Raw: 0.954
Self-reported
95.3%
Raw: 0.953
Self-reported
93.3%
Raw: 0.933
Self-reported
89.0%
Raw: 0.89
Self-reported
88.6%
Raw: 0.886
Self-reported
87.6%
Raw: 0.876
Self-reported
86.5%
Raw: 0.865
Self-reported
86.4%
Raw: 0.864
Self-reported
85.9%
Raw: 0.859
Self-reported
85.6%
Raw: 0.8558
Self-reported
85.2%
Raw: 0.852
Self-reported
83.8%
Raw: 0.838
Self-reported
83.5%
Raw: 0.835
Self-reported
83.0%
Raw: 0.83
Self-reported
81.9%
Raw: 0.819
Self-reported
80.1%
Raw: 0.801
Self-reported
78.6%
Raw: 0.786
Self-reported
78.6%
Raw: 0.786
Self-reported
76.8%
Raw: 0.768
Self-reported
72.2%
Raw: 0.722
Self-reported
Showing top 20 of 24 models