SWE-Bench Verified
general
text
About
SWE-Bench Verified benchmark
Evaluation Stats
Total Models28
Organizations5
Verified Results0
Self-Reported28
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
Score Distribution
28 models
Top Score
74.9%
Average Score
48.5%
High Performers (80%+)
0Top Organizations
#1Mistral AI
2 models
57.6%
#2Anthropic
6 models
54.9%
#3Google
5 models
49.1%
#4OpenAI
11 models
45.6%
#5DeepSeek
4 models
41.4%
Leaderboard
Top 20 models ranked by performance
72.7%
Raw: 0.727
Self-reported
72.5%
Raw: 0.725
Self-reported
70.3%
Raw: 0.703
Self-reported
67.2%
Raw: 0.672
Self-reported
63.2%
Raw: 0.632
Self-reported
61.6%
Raw: 0.616
Self-reported
60.4%
Raw: 0.604
Self-reported
57.6%
Raw: 0.576
Self-reported
53.6%
Raw: 0.536
Self-reported
15
49.2%
Raw: 0.492
Self-reported
49.0%
Raw: 0.49
Self-reported
17
42.0%
Raw: 0.42
Self-reported
18
41.3%
Raw: 0.413
Self-reported
40.6%
Raw: 0.406
Self-reported
Showing top 20 of 28 models