CodeEditorBench: Evaluating Code Editing Capability of Large Language Models Paper • 2404.03543 • Published Apr 4 • 15
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17 • 57
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Paper • 2408.07060 • Published Aug 13 • 40
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java Paper • 2408.14354 • Published Aug 26 • 40
FuzzCoder: Byte-level Fuzzing Test via Large Language Model Paper • 2409.01944 • Published Sep 3 • 44
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale Paper • 2409.16299 • Published Sep 9 • 9