🚀 Website under development • Launching soon

SWE-Dev

code
text
About

SWE-Dev benchmark that overwrites original repository files and deletes test files that exercise the functions the agent is expected to generate, eliminating indirect hints about the desired implementation

Evaluation Stats
Total Models0
Organizations0
Verified Results0
Self-Reported0
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers

No evaluation results available for this benchmark