SWE-Bench Verified

general
text
About

SWE-Bench Verified benchmark

Evaluation Stats
Total Models28
Organizations5
Verified Results0
Self-Reported28
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

Score Distribution

28 models
Top Score
74.9%
Average Score
48.5%
High Performers (80%+)
0

Top Organizations

#1Mistral AI
2 models
57.6%
#2Anthropic
6 models
54.9%
#3Google
5 models
49.1%
#4OpenAI
11 models
45.6%
#5DeepSeek
4 models
41.4%
Leaderboard
Top 20 models ranked by performance
74.9%
Raw: 0.749
Self-reported
72.7%
Raw: 0.727
Self-reported
72.5%
Raw: 0.725
Self-reported
70.3%
Raw: 0.703
Self-reported
69.1%
Raw: 0.691
Self-reported
68.1%
Raw: 0.681
Self-reported
67.2%
Raw: 0.672
Self-reported
63.2%
Raw: 0.632
Self-reported
61.6%
Raw: 0.616
Self-reported
60.4%
Raw: 0.604
Self-reported
57.6%
Raw: 0.576
Self-reported
54.6%
Raw: 0.546
Self-reported
53.6%
Raw: 0.536
Self-reported
49.3%
Raw: 0.493
Self-reported
49.2%
Raw: 0.492
Self-reported
49.0%
Raw: 0.49
Self-reported
42.0%
Raw: 0.42
Self-reported
41.3%
Raw: 0.413
Self-reported
41.0%
Raw: 0.41
Self-reported
40.6%
Raw: 0.406
Self-reported
Showing top 20 of 28 models