Terminal-bench

general
text
About

Terminal-bench benchmark

Evaluation Stats
Total Models5
Organizations2
Verified Results0
Self-Reported5
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

5 models
Top Score
39.2%
Average Score
30.9%
High Performers (80%+)
0

Top Organizations

#1Anthropic
4 models
31.1%
#2Moonshot AI
1 model
30.0%
Leaderboard
Top 5 models ranked by performance
39.2%
Raw: 0.392
Self-reported
35.5%
Raw: 0.355
Self-reported
35.2%
Raw: 0.352
Self-reported
30.0%
Raw: 0.3
Self-reported
14.7%
Raw: 0.147
Self-reported