QUICK REVIEW

[論文レビュー] TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code

Jiangping Huang, Wenguang Ye|arXiv (Cornell University)|Feb 6, 2026

Software Engineering Research被引用数 0

ひとこと要約

TraceCoder は三-agent ループ（Instrumentation、Analysis、Repair）と Historical Lesson Learning Mechanism および Rollback を用いて LLM が生成したコードをデバッグ・修復し、ベースラインより substantial Pass@1 の向上を達成します。

ABSTRACT

Large Language Models (LLMs) often generate code with subtle but critical bugs, especially for complex tasks. Existing automated repair methods typically rely on superficial pass/fail signals, offering limited visibility into program behavior and hindering precise error localization. In addition, without a way to learn from prior failures, repair processes often fall into repetitive and inefficient cycles. To overcome these challenges, we present TraceCoder, a collaborative multi-agent framework that emulates the observe-analyze-repair process of human experts. The framework first instruments the code with diagnostic probes to capture fine-grained runtime traces, enabling deep insight into its internal execution. It then conducts causal analysis on these traces to accurately identify the root cause of the failure. This process is further enhanced by a novel Historical Lesson Learning Mechanism (HLLM), which distills insights from prior failed repair attempts to inform subsequent correction strategies and prevent recurrence of similar mistakes. To ensure stable convergence, a Rollback Mechanism enforces that each repair iteration constitutes a strict improvement toward the correct solution. Comprehensive experiments across multiple benchmarks show that TraceCoder achieves up to a 34.43\% relative improvement in Pass@1 accuracy over existing advanced baselines. Ablation studies verify the significance of each system component, with the iterative repair process alone contributing a 65.61\% relative gain in accuracy. Furthermore, TraceCoder significantly outperforms leading iterative methods in terms of both accuracy and cost-efficiency.

研究の動機と目的

LLM が生成したコードの自動デバッグを促進し、バイナリの pass/fail 信号を超える微妙なバグに対処する。
expert debugging（observe-analyze-repair）を模倣するトレース駆動型のマルチエージェントアーキテクチャを導入する。
runtime traces と過去の failure から学習して fault localization および repair 効率を向上させる。
Rollback Mechanism と Historical Lesson Learning Mechanism によって信頼性と収束性を強化する。

提案手法

Instrumentation Agent は、意味を変えずにファウルコードへ診断プローブを挿入し、細粒度の runtime traces を収集する。
Analysis Agent は runtime traces および過去の failure から因果推論を行い、修復計画と instrumentation の提案を生成する。
Repair Agent は提案された修復計画を適用してコードを修正し、反復テストに参加する。
Historical Lesson Learning Mechanism (HLLM) は、 failed repairs から教訓を抽出して将来のサイクルに活用する。
Rollback Mechanism は、最良の既知状態を保持・復元して着実な改善を保証する。
Shared artifact-based communication は、エージェント間の反復的・構造化されたフィードバッグを媒介する。
Evaluations は HumanEval、HumanEval+、BigCodeBench、ClassEval における基準と TraceCoder を比較し、 Pass@1 を指標として用いる。

Figure 1. Limitations of simple execution feedback. Without runtime insights, the model repeatedly applies local patches that degrade the code’s correctness, causing it to loop between incorrect versions rather than converging to a correct global solution.

実験結果

リサーチクエスチョン

RQ1RQ1: TraceCoder は高度な自動修復手法と比較して LLM 生成コードの修復にどの程度効果的か？
RQ2RQ2: TraceCoder の主要ハイパーパラメータは修復性能と安定性にどう影響するか？
RQ3RQ3: TraceCoder の各コアコンポーネントは全体的な有効性にどの程度寄与しているか？
RQ4RQ4: サンプリングベースの戦略と比較して、 TraceCoder は信頼性、コスト効率、故障モードにおいて実践的にどのように機能するか？

主な発見

TraceCoder は挑戦的なクラスレベルのベンチマークで Pass@1 の相対改善を最大 34.43% 遂げる。
アブレーションにより、反復修復のみで相対的に 65.61% の精度向上を示す。
TraceCoder は精度とコスト効率の両方で主要な反復法を上回る。
このフレームワークは細粒度の runtime traces、歴史的学習、ロ rollback を活用して収束を安定させる。

Figure 2. Overview of TraceCoder’s workflow. ① An LLM generates an initial code solution. ② The code is executed and tested. A multi-agent debugging loop—comprising the Instrumentation, Analysis, and Repair Agents—emulates expert debugging behaviors by leveraging runtime tracing, HLLM, and RM to ena

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。