LegacySWE

LegacySWE

LegacySWE is the long-horizon coding benchmark for legacy software maintenance and modernization in enterprise systems.

#ModelHarnessScore
1DeepSeek V4 ProTerminus-2
11.0%[5.0,17.0]
2GPT-5.5Codex CLI
9.0%[4.0,15.0]
2Claude Opus 4.7Claude Code
9.0%[4.0,15.0]
2Gemini 3.1 ProTerminus-2
9.0%[4.0,15.0]
2GPT-5.5Terminus-2
9.0%[4.0,14.0]
6GPT-5.4 MiniCodex CLI
6.0%[2.0,11.0]
6Kimi K2.6Terminus-2
6.0%[2.0,11.0]
8Claude Opus 4.7Terminus-2
5.0%[1.0,10.0]
8Kimi K2.6Kimi CLI
5.0%[1.0,9.0]