QUICK REVIEW

[論文レビュー] Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition

Mingwei Liu, Zhenxi Chen|arXiv (Cornell University)|Mar 2, 2026

Software Engineering Research被引用数 0

ひとこと要約

RAIMはリポジトリレベルのアーキテクチャ認識フレームワークを導入し、複数のデザインパッチを生成して影響分析でパッチを選択する。NoCode-bench Verifiedで最先端の成績を達成。

ABSTRACT

Implementing new features across an entire codebase presents a formidable challenge for Large Language Models (LLMs). This proactive task requires a deep understanding of the global system architecture to prevent unintended disruptions to legacy functionalities. Conventional pipeline and agentic frameworks often fall short in this area because they suffer from architectural blindness and rely on greedy single-path code generation. To overcome these limitations, we propose RAIM, a multi-design and architecture-aware framework for repository-level feature addition. This framework introduces a localization mechanism that conducts multi-round explorations over a repository-scale code graph to accurately pinpoint dispersed cross-file modification targets. Crucially, RAIM shifts away from linear patching by generating multiple diverse implementation designs. The system then employs a rigorous impact-aware selection process based on static and dynamic analysis to choose the most architecturally sound patch and avoid system regressions. Comprehensive experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance with a 39.47% success rate, achieving a 36.34% relative improvement over the strongest baseline. Furthermore, the approach exhibits robust generalization across various foundation models and empowers open-weight models like DeepSeek-v3.2 to surpass baseline systems powered by leading proprietary models. Detailed ablation studies confirm that the multi-design generation and impact validation modules are critical to effectively managing complex dependencies and reducing code errors. These findings highlight the vital role of structural awareness in automated software evolution.

研究の動機と目的

自動化されたリポジトリレベルの機能追加を、アーキテクチャ認識を要する積極的なソフトウェア進化タスクとして動機付ける。
既存手法のアーキテクチャの盲点と線形生成に対処するためにRAIMを提案する。
アーキテクチャ認識の局在化、多 Designパッチ生成、影響認識パッチ選択の4段階フレームワークを開発する。
NoCode-bench Verifiedで複数のLLMおよびオープンウェイトモデルに対するRAIMの有効性と一般化を実証する。

提案手法

リポジトリレベルのコードグラフを構築し、意味的・構造的関係を捕捉する。
コードグラフ上で多回の探索を通じてアーキテクチャ認識型のファイルと関数の局在化を行う。
複数の多様な実装デザインと対応するパッチを生成する。
静的変更影響分析と動的テスト実行で候補パッチを評価し、最適なパッチを選択する。

実験結果

リサーチクエスチョン

RQ1RAIMはリポジトリレベルの機能追加タスクで最先端のベースラインと比較してどう性能を示すか。
RQ2RAIMは異なるLLM間で一般化でき、ファイル間の機能追加を効果的に扱えるか。
RQ3局在化、複数デザイン生成、影響分析の各RAIMコンポーネントが全体性能に対してどの程度寄与するか。
RQ4パッチ選択戦略は機能の正確さとアーキテクチャ的整合性のバランスをどの程度効果的に取れるか。

主な発見

Method	Model	RT (%)	FV-Micro (%)	FV-Macro (%)	Success (%)
OpenHands	Qwen3-235B	47.37	1.96	14.03	7.89
DeepSeek-R1	Qwen3-235B	46.49	0.47	10.86	7.02
DeepSeek-v3	Qwen3-235B	49.12	1.68	18.29	11.40
Gemini-2.5-Pro	-	61.40	0.01	0.29	0.00
Claude-4-Sonet	-	69.30	11.25	36.48	25.44
Agentless	Qwen3-235B	76.32	8.75	22.39	13.16
GPT-5-Chat	-	82.46	8.50	33.01	18.42
DeepSeek-R1	-	73.68	10.87	35.52	25.44
DeepSeek-v3	-	78.95	7.96	32.80	21.05
DeepSeek-v3.2	-	28.95	9.46	37.42	28.95
DeepSeek-v3.2-thinking	-	79.82	8.41	37.02	27.19
Gemini-2.5-Pro	-	74.56	6.22	20.55	12.28
Claude-4-Sonet	-	79.82	8.47	38.48	28.07
RAIM	Qwen3-235B	79.82	9.76	27.45	16.67
GPT-5-Chat	-	89.47	13.43	32.33	21.93
DeepSeek-v3	-	81.58	15.14	35.64	25.44
DeepSeek-R1	-	77.19	12.47	41.79	29.82
DeepSeek-v3.2	-	85.96	16.01	45.58	34.21
DeepSeek-v3.2-thinking	-	78.07	11.93	41.74	29.82
Gemini-2.5-Pro	-	82.46	17.16	52.09	39.47
-	-	-	-	-	-

RAIMはNoCode-bench VerifiedでGemini-2.5-Proを用い39.47%の新たな最先端成功率を達成し、以前の最高から相対的に36.34%向上。
RAIMはOpen-weightモデルのDeepSeek-v3.2のようなモデルでも34.21%の成功率を達成し、強力なプロプライエタリモデルを使用するいくつかのベースラインを上回る。
アブレーション研究は、複雑な依存関係を管理しコードエラーを減らすために、複数デザイン生成と影響検証の両方が重要であることを示している。
RAIMは7つのLLMに対して堅牢な一般化を示し、特に複雑なファイル間変更タスクで顕著な向上をもたらす。
本手法はアーキテクチャ認識と変更影響分析を強調して、本番運用ソフトウェアのリグレッションを防ぐことを目指す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。