Terminal-bench
general
text
About
Terminal-bench benchmark
Evaluation Stats
Total Models5
Organizations2
Verified Results0
Self-Reported5
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
5 models
Top Score
39.2%
Average Score
30.9%
High Performers (80%+)
0Top Organizations
#1Anthropic
4 models
31.1%
#2Moonshot AI
1 model
30.0%
Leaderboard
Top 5 models ranked by performance
39.2%
Raw: 0.392
Self-reported
35.5%
Raw: 0.355
Self-reported
35.2%
Raw: 0.352
Self-reported
30.0%
Raw: 0.3
Self-reported
14.7%
Raw: 0.147
Self-reported