SWE-bench Multilingual

general

text

About

SWE-bench Multilingual benchmark

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

en

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

47.3%

Average Score

47.3%

High Performers (80%+)

0

Top Organizations

#1Moonshot AI

1 model

47.3%

Leaderboard

Top 1 models ranked by performance

1

Kimi K2 Instruct

47.3%

Raw: 0.473

Self-reported