SWE-Dev
code
text
About
SWE-Dev benchmark that overwrites original repository files and deletes test files that exercise the functions the agent is expected to generate, eliminating indirect hints about the desired implementation
Evaluation Stats
Total Models0
Organizations0
Verified Results0
Self-Reported0
Benchmark Details
Max Score1
Language
en
Performance Overview
Score distribution and top performers
No evaluation results available for this benchmark