QUICK REVIEW

[논문 리뷰] Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use

Ruocheng Guo, Kaiwen Dong|arXiv (Cornell University)|2026. 02. 23.

Scientific Computing and Data Management인용 수 0

한 줄 요약

본 논문은 Trace-Free+를 소개합니다. 이는 trace가 풍부한 학습에서 trace가 없는 배치로 지식을 전이하여 LLM 기반 에이전트의 도구 인터페이스를 개선하는 커리큘럼 학습 프레임워크로, 보이지 않는 도구에 대한 더 나은 도구 선택 및 사용을 가능하게 합니다.

ABSTRACT

The performance of LLM-based agents depends not only on the agent itself but also on the quality of the tool interfaces it consumes. While prior work has focused heavily on agent fine-tuning, tool interfaces-including natural language descriptions and parameter schemas-remain largely human-oriented and often become a bottleneck, especially when agents must select from large candidate tool sets. Existing approaches to improving tool interfaces rely on execution traces, which are frequently unavailable in cold-start or privacy-constrained settings, and typically optimize each tool independently, limiting scalability and generalization to unseen tools. We propose Trace-Free+, a curriculum learning framework that progressively transfers supervision from trace-rich settings to trace-free deployment, encouraging the model to abstract reusable interface-usage patterns and tool usage outcomes. To support this approach, we construct a large-scale dataset of high-quality tool interfaces using a structured workflow over a diverse collection of tools. Experiments on StableToolBench and RestBench show consistent gains on unseen tools, strong cross-domain generalization, and robustness as the number of candidate tools scales to over 100, demonstrating that tool interface optimization is a practical and deployable complement to agent fine-tuning.

연구 동기 및 목표

LLM 기반 도구 사용 에이전트용 도구 인터페이스(설명 및 매개변수 스키마)의 품질과 일반화 가능성을 개선합니다.
콜드 스타트 및 프라이버시 제약 환경에서 강건한 도구 선택과 매개변수 생성을 가능하게 한다.
여러 도구에 걸쳐 고품질 도구 인터페이스를 생성하기 위한 확장 가능한 데이터 합성 워크플로를 개발합니다.
후보 도구 세트가 100개를 넘어 확장될 때 도메인 간 일반화와 확장성을 입증합니다.

제안 방법

실세계 도구에 대한 구조화된 에이전트적 워크플로를 사용하여 대규모의 고품질 도구 인터페이스 데이터셋을 구축합니다(Health와 완전성을 위해 ToolBench 시드가 다듬어짐).
다단계 의존성 인식 사용자 쿼리를 합성하여 도구 간 사용 패턴과 실패를 드러냅니다.
두 단계 설명 정제를 이용해 도구 설명 생성기로서 오픈 가중치 LLM을 훈련합니다(D0 -> D1 일반적 개선; D1 -> D2 trace 기반 정제는 RIMRULE를 통해), trace 기반 및 trace-프리 추론을 가능하게 합니다.
Trace-Free+를 통해 trace-rich 데이터와 trace-free 데이터 모두에서 모델을 학습시키고 점진적으로 trace-free 감독의 의존도를 높이는 커리큘럼 학습을 적용합니다.
RestBench와 StableToolBench에서 트레이스-프리 및 트레이스-베이스 설정과 교사 강제 프로토콜로 평가하고, 서브태스크, 쿼리 및 도구 수준의 지표를 측정합니다.

Figure 1 : An illustration of the proposed tool interface improvement pipeline. Compared to the original description ( $D0$ ), the learned description generator produces more effective tool descriptions that lead to better tool usage.

실험 결과

연구 질문

RQ1trace-free 학습이 추론 중 미지의 도구에 대해 trace-based 감독의 이점을 전달할 수 있는가?
RQ2도구 후보 세트가 확장될 때 커리큘럼 학습 전략이 일반화와 강건성을 향상시키는가?
RQ3학습된 도구 설명 생성기가 trace-free 조건에서 trace-based 베이스라인 및 프롬프트 기반 방법과 비교해 얼마나 잘 작동하는가?
RQ4도메인 내 도구 세트와 교차 도메인 도구 세트에서 개선이 일관적인가?
RQ5다중 홉 작업에서 도구 설명 품질이 도구 선택 및 API 실행 성공에 미치는 영향은 무엇인가?

주요 결과

Trace-Free+는 unseen 도구에서 trace-free 및 일부 베이스라인과 비교했을 때 서브태스크 및 쿼리 수준 성공률을 일관되게 개선합니다.
Trace-Free+가 더 어려운 멀티홉 부분집합에서 D1보다 우수하며, 도구 간 의존성 학습에 있어 trace 정보를 활용한 커리큘럼의 가치를 시사합니다.
Trace-Free+는 강력한 교차 도메인 일반화를 달성하여 StableToolBench Split B에서 학습할 때 RestBench(TMDB/Spotify)에서 성능을 향상시킵니다.
후보 도구 수가 100개를 넘는 경우에도 Trace-Free+는 견고하며 베이스라인보다 성능 저하가 작습니다.
Trace 기반 모델은 흔적 속의 도구 사용 패턴에서 더 많은 이점을 얻지만, trace-free 커리큘럼은 차가운 시작 제약 하에서도 여전히 경쟁력 있는 결과를 제공합니다.

Figure 2 : The SFT data synthesis pipeline.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.