QUICK REVIEW

[论文解读] RIVA: Leveraging LLM Agents for Reliable Configuration Drift Detection

Sami Abuzakuk, Lucas Crijns|arXiv (Cornell University)|Mar 2, 2026

Software System Performance and Reliability被引用 0

一句话总结

RIVA 是一个两代理系统（Verifier 与 Tool Generation），通过对多次独立工具调用进行交叉校验，在工具输出可能具有误导性时，鲁棒地验证 IaC 定义的基础设施是否存在漂移，从而提升可靠性。

ABSTRACT

Infrastructure as code (IaC) tools automate cloud provisioning but verifying that deployed systems remain consistent with the IaC specifications remains challenging. Such configuration drift occurs because of bugs in the IaC specification, manual changes, or system updates. Large language model (LLM)-based agentic AI systems can automate the analysis of large volumes of telemetry data, making them suitable for the detection of configuration drift. However, existing agentic systems implicitly assume that the tools they invoke always return correct outputs, making them vulnerable to erroneous tool responses. Since agents cannot distinguish whether an anomalous tool output reflects a real infrastructure problem or a broken tool, such errors may cause missed drift or false alarms, reducing reliability precisely when it is most needed. We introduce RIVA (Robust Infrastructure by Verification Agents), a novel multi-agent system that performs robust IaC verification even when tools produce incorrect or misleading outputs. RIVA employs two specialized agents, a verifier agent and a tool generation agent, that collaborate through iterative cross-validation, multi-perspective verification, and tool call history tracking. Evaluation on the AIOpsLab benchmark demonstrates that RIVA, in the presence of erroneous tool responses, recovers task accuracy from 27.3% when using a baseline ReAct agent to 50.0% on average. RIVA also improves task accuracy 28% to 43.8% without erroneous tool responses. Our results show that cross-validation of diverse tool calls enables more reliable autonomous infrastructure verification in production cloud environments.

研究动机与目标

通过在不可依赖工具存在时实现鲁棒验证，解决 IaC 的配置漂移问题。
利用多代理协作对工具输出进行交叉校验，降低误报。
在带有错误工具的条件下，对比 ReAct 基线，在 AIOpsLab 基准上评估 RIVA。
量化工具调用历史和超参数 K 如何影响验证的可靠性。

提出的方法

提出两代理架构：Verifier Agent（验证代理）和 Tool Generation Agent（工具生成代理），共享 Tool Call History（工具调用历史）。
对每个属性进行 K 次独立工具调用的交叉验证，以确定漂移的可靠性。
Tool Generation Agent 为同一属性提出多样化、互不相同的工具调用，并将结果记录在 Tool History 中。
在将属性判定为 satisfed（满足）或 violated（违反）之前，要求有 K 条经验证的工具路径。
在改造过的 AIOpsLab 基准上评估，使用不可靠的工具来模拟静默错误。

实验结果

研究问题

RQ1在工具输出错误时，代理式 AI 如何可靠地验证 IaC 合规性？
RQ2对多次工具调用进行交叉验证是否比单代理基线能提高漂移检测的准确性？
RQ3诊断路径参数 K 对验证成功率和效率有何影响？
RQ4在错误工具输出下，RIVA 在定位、检测和分析任务中的表现如何？

主要发现

在错误工具输出条件下，RIVA 将平均任务准确率从基线 ReAct 的 27.3% 提升至平均 50.0%。
在没有错误工具时，RIVA 将平均准确率从 ReAct 的 28% 提升至 43.8%。
K=2 的 RIVA 在多项任务上优于 ReAct，达到更高的成功率（例如某些设置下为 43.75% 对 28.00%）。
RIVA 通常比 ReAct 需要的步骤和令牌更少，体现更高的效率（例如大多数任务在 15 步内完成；正确工具时为 38,000 令牌对比 ReAct 的 78,000）。
在错误工具条件下，RIVA 最多需要 17 步，而部分 ReAct 运行则超过 37% 的概率达到 45 步。
将 K 增加到 3 时，由于 AIOpsLab 的环境约束，报告的成功率为零，凸显 K 的关键作用及环境依赖性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。