QUICK REVIEW

[논문 리뷰] RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis

Zhengwei Tao, Bo Li|arXiv (Cornell University)|2026. 01. 13.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

RAGShaper는 InfoCurator를 통해 dense, distractor-augmented retrieval 환경을 구축하고, robust 에이전트 트레이젝토리를 생성하는 교사를 교육하며, 노이즈가 많고 다중 홉 RAG 작업에서 탁월하도록 모델을 미세조정하는 자동 데이터 합성 프레임워크를 제시합니다.

ABSTRACT

Agentic Retrieval-Augmented Generation (RAG) empowers large language models to autonomously plan and retrieve information for complex problem-solving. However, the development of robust agents is hindered by the scarcity of high-quality training data that reflects the noise and complexity of real-world retrieval environments. Conventional manual annotation is unscalable and often fails to capture the dynamic reasoning strategies required to handle retrieval failures. To bridge this gap, we introduce RAGShaper, a novel data synthesis framework designed to automate the construction of RAG tasks and robust agent trajectories. RAGShaper incorporates an InfoCurator to build dense information trees enriched with adversarial distractors spanning Perception and Cognition levels. Furthermore, we propose a constrained navigation strategy that forces a teacher agent to confront these distractors, thereby eliciting trajectories that explicitly demonstrate error correction and noise rejection. Comprehensive experiments confirm that models trained on our synthesized corpus significantly outperform existing baselines, exhibiting superior robustness in noise-intensive and complex retrieval tasks.

연구 동기 및 목표

에이전트 RAG 모델의 수동 주석을 넘어선 확장 가능하고 고품질 데이터의 필요성을 제시합니다.
정보 Curator, distractor 생성, 제약된 교사 탐색을 포함한 자동 데이터 합성 파이프라인을 제안하여 풍부한 작업 트 trajectories 를 생성합니다.
합성 데이터로 학습된 모델이 노이즈가 많고 다중 홉 검색 벤치마크에서 베이스라인보다 우수함을 입증합니다.

제안 방법

InfoCurator는 시드 엔터티와 위키피디아 기반 지식에서 Dense 정보 트리를 독립적으로 구축하고, Perception 및 Cognition 수준에 걸쳐 Positive Facts와 Adversarial Distractors를 생성합니다.
Distractor Curation Tool은 네 가지 방해 유형(Doppelgänger, False Shortcut, Fragmented Puzzle, Subjective Fallacy)을 만들어 노이즈를 주입하고 추론을 도전합니다.
Question-Answer Synthesis는 LLM을 사용하여 큐레이션된 정보에서 경로 순서 증거를 엄격하게 요구하는 쿼리를 역설계합니다.
Behavior Elicitation은 제한된 검색 전략을 가진 Teacher 에이전트를 활용하여 Distractors를 반드시 검색하도록 하며, 적응형 오차 수정 및 노이즈 거부 전략을 포착합니다.
Training은 Teacher의 해법이 Ground Truth와 일치하는 Q, T, A로 구성된 트레이젝토리에 대해 기본 LLM을 미세 조정하며, 음의 대로그 우도 손실로 표준 감독 미세조정을 사용합니다.

실험 결과

연구 질문

RQ1자동 데이터 합성이 수동 주석에 비해 더 고품질이고 더 강건한 에이전트 RAG 학습 데이터를 생산할 수 있는가?
RQ2지각/인지 distractor와 제약된 교사 탐색을 도입하면 더 강건하고 잡음에 더 강인한 에이전트 행동을 얻을 수 있는가?
RQ3RAGShaper 데이터로 학습된 모델이 서로 다른 백본 아키텍처에 걸쳐 일반화되는가?
RQ4합성 데이터의 품질 및 트레이젝트리의 복잡도가 인간 레이블 데이터와 비교하여 검색 깊이와 도구 사용 측면에서 어떤 차이가 있는가?

주요 결과

RAGShaper에서 합성된 데이터가 개방형 벤치마크에서 평균 EM 및 평균 F1에서 우수한 성능을 나타내며(예: 4.5k: 48.8 EM, 59.8 F1; 6.5k: 50.3 EM, 62.0 F1), 여러 베이스라인과 비교하여 우수함을 보임.
Distractor가 없는 RAGShaper-Dis는 성능을 크게 낮추며(Avg EM이 48.8에서 33.8로 감소), 방해 요소 기반 학습의 중요성을 강조함.
RAGShaper의 트래젝토리는 인간 주석 데이터보다 더 깊고 도구가 풍부하며, 장고 분포(40단계 이상까지)로 풍부한 에이전트 합리화를 시사함.
대부분의 트래젝토리는 내부 지식보다는 검색에 의존합니다(Direct Answer 비율 0%, Fallback 4.2%), 외부 근거 활용의 견고함을 강조함.
RAGShaper는 백본 간 일반화가 가능하며(예: Qwen3-30B-A3B-Think 및 Qwen3-4B-Think 백본에서 HL-Data 대비 이점).
Distractor 기반 학습은 AmbigQA, Bamboogle 같은 노이즈에 민감한 데이터셋에서 성능을 크게 향상시켜 적대적 데이터 합성의 유효성을 입증합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.