SWE-Dev

code

text

About

SWE-Dev benchmark that overwrites original repository files and deletes test files that exercise the functions the agent is expected to generate, eliminating indirect hints about the desired implementation

Evaluation Stats

Total Models0

Organizations0

Verified Results0

Self-Reported0

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

No evaluation results available for this benchmark