[Paper Review] How to Design a Program Repair Bot? Insights from the Repairnator Project
This paper presents Repairnator, a CI-based program repair bot that monitors failures, reproduces bugs, runs repair tools, and reports patches, with empirical results across 11,523 failures in 1,609 GitHub projects, yielding 15 patches (all overfitting).
Program repair research has made tremendous progress over the last few years, and software development bots are now being invented to help developers gain productivity. In this paper, we investigate the concept of a " program repair bot " and present Repairnator. The Repairnator bot is an autonomous agent that constantly monitors test failures, reproduces bugs, and runs program repair tools against each reproduced bug. If a patch is found, Repairnator bot reports it to the developers. At the time of writing, Repairnator uses three different program repair systems and has been operating since February 2017. In total, it has studied 11 317 test failures over 1 609 open-source software projects hosted on GitHub, and has generated patches for 17 different bugs. Over months, we hit a number of hard technical challenges and had to make various design and engineering decisions. This gives us a unique experience in this area. In this paper, we reflect upon Repairnator in order to share this knowledge with the automatic program repair community.
Motivation & Objective
- Explore the end-to-end design of an autonomous program repair bot integrated with CI workflows.
- Empirically evaluate Repairnator's ability to reproduce bugs and generate patches across a large set of open-source Java projects.
- Identify practical engineering decisions and challenges encountered in building and operating Repairnator and extract actionable recommendations for future repair bots.
- Provide data and insights to bridge the gap between program repair research and industrial practice.
Proposed method
- Describe the three-stage Repairnator pipeline: CI Build Analysis, Bug Reproduction, and Patch Synthesis.
- Use three program repair systems (NPol, NPEFix, Astor) to generate patches for reproduced failures.
- Reproduce failures locally using Maven-based builds and isolated dependency management.
- Have a human-in-the-loop patch analyst sanity-check patches before reporting to developers.
- Archive patches and reproduction/repair logs to support open science and future research.
Experimental results
Research questions
- RQ1What is the feasibility and effectiveness of an autonomous program repair bot in a real-world CI environment?
- RQ2What are the empirical characteristics (volume, success rate, types of failures) of bug reproduction and patch synthesis in large-scale practice?
- RQ3What major challenges (e.g., overfitting, reproducibility) limit end-to-end repair and how can design choices address them?
- RQ4What actionable recommendations can be derived to guide future program repair bot development?
Key findings
- Repairnator operated from February 2017 to January 2018 across 11,523 CI failures in 1,609 open-source Java projects.
- Patches were generated for 15 different bugs, but none were proposed to developers due to overfitting (patches fixed the failing build but not the real bug).
- Repairnator successfully reproduced 3,551 of 11,523 failing builds (30.82%).
- Among reproduced bugs, 1,307 patches were produced across the three repair tools, indicating a large search space with many test-suite adequate patches, most of which were overfitting.
- The most common failure types observed were AssertionError and NullPointerException, together constituting a substantial portion of failures (over 70% across the top 10 types).
- Reproducibility of builds is highly project dependent (e.g., druid-io/druid: 62% reproduced; prestodb/presto: 19.40%), highlighting project-level variability in repair potential.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.