BIG-Bench Hard
general
text
About
BIG-Bench Hard benchmark
Evaluation Stats
Total Models21
Organizations4
Verified Results0
Self-Reported21
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
21 models
Top Score
93.1%
Average Score
71.2%
High Performers (80%+)
8Top Organizations
#1Anthropic
5 models
85.9%
#2Microsoft
3 models
72.8%
#3Google
10 models
65.4%
#4IBM
3 models
64.7%
Leaderboard
Top 20 models ranked by performance
93.1%
Raw: 0.931
Self-reported
93.1%
Raw: 0.931
Self-reported
89.2%
Raw: 0.892
Self-reported
87.6%
Raw: 0.876
Self-reported
86.8%
Raw: 0.868
Self-reported
85.7%
Raw: 0.857
Self-reported
85.5%
Raw: 0.855
Self-reported
82.9%
Raw: 0.829
Self-reported
79.1%
Raw: 0.791
Self-reported
73.7%
Raw: 0.737
Self-reported
11
72.2%
Raw: 0.722
Self-reported
12
70.4%
Raw: 0.704
Self-reported
69.1%
Raw: 0.6913
Self-reported
69.1%
Raw: 0.6913
Self-reported
69.0%
Raw: 0.69
Self-reported
55.7%
Raw: 0.557
Self-reported
52.9%
Raw: 0.529
Self-reported
18
52.9%
Raw: 0.529
Self-reported
19
44.3%
Raw: 0.443
Self-reported
44.3%
Raw: 0.443
Self-reported
Showing top 20 of 21 models