MT-Bench

roleplay
text
About

MT-Bench benchmark

Evaluation Stats
Total Models11
Organizations4
Verified Results0
Self-Reported11
Benchmark Details
Max Score100
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

11 models
Top Score
93.5%
Average Score
78.8%
High Performers (80%+)
9

Top Organizations

#1DeepSeek
1 model
90.2%
#2Alibaba
3 models
88.4%
#3Mistral AI
4 models
82.4%
#4NVIDIA
3 models
60.6%
Leaderboard
Top 11 models ranked by performance
93.5%
Raw: 93.5
Self-reported
91.7%
Raw: 91.7
Self-reported
90.2%
Raw: 90.2
Self-reported
87.5%
Raw: 87.5
Self-reported
86.3%
Raw: 86.3
Self-reported
84.1%
Raw: 84.1
Self-reported
83.5%
Raw: 83.5
Self-reported
83.0%
Raw: 83
Self-reported
81.0%
Raw: 81
Self-reported
76.8%
Raw: 76.8
Self-reported
9.0%
Raw: 8.99
Self-reported