LiveBench

roleplay
text
About

LiveBench benchmark

Evaluation Stats
Total Models12
Organizations4
Verified Results0
Self-Reported12
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

12 models
Top Score
84.6%
Average Score
62.1%
High Performers (80%+)
1

Top Organizations

#1Moonshot AI
1 model
76.4%
#2OpenAI
3 models
68.0%
#3Alibaba
7 models
59.6%
#4Microsoft
1 model
47.6%
Leaderboard
Top 12 models ranked by performance
84.6%
Raw: 0.846
Self-reported
77.1%
Raw: 0.771
Self-reported
76.4%
Raw: 0.764
Self-reported
74.9%
Raw: 0.749
Self-reported
74.3%
Raw: 0.743
Self-reported
73.1%
Raw: 0.731
Self-reported
67.0%
Raw: 0.67
Self-reported
52.3%
Raw: 0.523
Self-reported
52.3%
Raw: 0.523
Self-reported
47.6%
Raw: 0.476
Self-reported
35.9%
Raw: 0.359
Self-reported
29.6%
Raw: 0.296
Self-reported