QUICK REVIEW

[논문 리뷰] RIVA: Leveraging LLM Agents for Reliable Configuration Drift Detection

Sami Abuzakuk, Lucas Crijns|arXiv (Cornell University)|2026. 03. 02.

Software System Performance and Reliability인용 수 0

한 줄 요약

tldr: RIVA는 두 에이전트 시스템(검증자 및 도구 생성)으로, 다수의 독립 도구 호출을 교차 검증하여 IaC로 정의된 인프라의 드리프트를 견고하게 검증하고, 일부 도구가 오해의 소지가 있는 출력을 제공할 때 신뢰성을 향상시킵니다.

ABSTRACT

Infrastructure as code (IaC) tools automate cloud provisioning but verifying that deployed systems remain consistent with the IaC specifications remains challenging. Such configuration drift occurs because of bugs in the IaC specification, manual changes, or system updates. Large language model (LLM)-based agentic AI systems can automate the analysis of large volumes of telemetry data, making them suitable for the detection of configuration drift. However, existing agentic systems implicitly assume that the tools they invoke always return correct outputs, making them vulnerable to erroneous tool responses. Since agents cannot distinguish whether an anomalous tool output reflects a real infrastructure problem or a broken tool, such errors may cause missed drift or false alarms, reducing reliability precisely when it is most needed. We introduce RIVA (Robust Infrastructure by Verification Agents), a novel multi-agent system that performs robust IaC verification even when tools produce incorrect or misleading outputs. RIVA employs two specialized agents, a verifier agent and a tool generation agent, that collaborate through iterative cross-validation, multi-perspective verification, and tool call history tracking. Evaluation on the AIOpsLab benchmark demonstrates that RIVA, in the presence of erroneous tool responses, recovers task accuracy from 27.3% when using a baseline ReAct agent to 50.0% on average. RIVA also improves task accuracy 28% to 43.8% without erroneous tool responses. Our results show that cross-validation of diverse tool calls enables more reliable autonomous infrastructure verification in production cloud environments.

연구 동기 및 목표

도구가 신뢰할 수 없는 환경에서도 견고한 검증을 가능하게 하여 IaC의 구성 드리프트를 해결한다.
다중 에이전트 협업을 활용해 도구 출력을 교차 검증하고 오경보를 줄인다.
도구가 불완전한 조건에서 AIOpsLab 벤치마크에서 ReAct 기준선과 비교하여 RIVA를 평가한다.
도구 호출 이력과 하이퍼파라미터 K가 검증 신뢰도에 미치는 영향을 정량화한다.

제안 방법

Tool Call History를 공유하는 검증자 에이전트와 도구 생성 에이전트의 이중 에이전트 구조를 도입한다.
속성당 K개의 독립적인 도구 호출에 걸친 교차 검증을 사용하여 드리프트 신뢰성을 결정한다.
도구 생성 에이전트가 동일 속성에 대해 다양하고 구분된 도구 호출을 제안하고 결과를 도구 이력에 기록한다.
속성이 만족되거나 위반으로 결론 내리기 전에 K개의 검증된 도구 경로를 요구한다.
신뢰할 수 없는 도구를 사용해 조용한 도구 오류를 시뮬레이션하는 수정된 AIOpsLab 벤치마크로 평가한다.

실험 결과

연구 질문

RQ1에이전트형 AI가 도구가 잘못된 출력을 낼 때 IaC 적합성을 어떻게 신뢰성 있게 검증할 수 있는가?
RQ2여러 도구 호출에 걸친 교차 검증이 단일 에이전트 기준선에 비해 드리프트 탐지 정확도를 향상시키는가?
RQ3진단 경로 매개변수 K가 검증 성공 및 효율성에 미치는 영향은?
RQ4오류가 있는 도구 응답하에서 RIVA가 위치 파악, 탐지 및 분석 작업에서 어떻게 동작하는가?

주요 결과

RIVA는 잘못된 도구 응답 하에서 평균 작업 정확도를 바탕으로 27.3%(오류 도구를 사용하는 기본 ReAct)에서 평균 50.0%로 향상시킨다.
오류 도구가 없을 때도 RIVA는 평균 정확도를 ReAct의 28%에서 43.8%로 올린다.
K=2인 RIVA는 작업 전반에서 ReAct를 능가하며 더 높은 성공률을 달성한다(예: 일부 설정에서 43.75% 대 28.00%).
RIVA는 일반적으로 ReAct보다 적은 단계와 토큰을 필요로 하여 더 큰 효율성을 나타낸다(예: 대부분의 작업이 15단계 이내에 완료; 올바른 도구 사용 시 토큰은 38,000대 78,000).
오류 도구가 있는 경우 RIVA는 최대 17단계가 필요하고, 일부 ReAct 실행은 45단계를 넘는 경우가 37% 이상이다.
환경 제약으로 인해 AIOpsLab에서 K를 3으로 올려도 보고된 성공이 0이 되며, K의 중요한 역할과 환경 의존성을 부각시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.