QUICK REVIEW

[論文レビュー] A Trace-Based Assurance Framework for Agentic AI Orchestration: Contracts, Testing, and Governance

Ciprian Păduraru, Petru-Liviu Bouruc|arXiv (Cornell University)|Mar 18, 2026

Multi-Agent Systems and Negotiation被引用数 0

ひとこと要約

この論文はエージェントAIシステムの追跡ベースの保証フレームワークを提案し、Message-Action Traces (MAT)、機械可読な契約、境界での束縛 perturbation によるストレステスト、故障注入、言語から行動への境界でのランタイムガバナンスを導入します。

ABSTRACT

In Agentic AI, Large Language Models (LLMs) are increasingly used in the orchestration layer to coordinate multiple agents and to interact with external services, retrieval components, and shared memory. In this setting, failures are not limited to incorrect final outputs. They also arise from long-horizon interaction, stochastic decisions, and external side effects (such as API calls, database writes, and message sends). Common failures include non-termination, role drift, propagation of unsupported claims, and attacks via untrusted context or external channels. This paper presents an assurance framework for such Agentic AI systems. Executions are instrumented as Message-Action Traces (MAT) with explicit step and trace contracts. Contracts provide machine-checkable verdicts, localize the first violating step, and support deterministic replay. The framework includes stress testing, formulated as a budgeted counterexample search over bounded perturbations. It also supports structured fault injection at service, retrieval, and memory boundaries to assess containment under realistic operational faults and degraded conditions. Finally, governance is treated as a runtime component, enforcing per-agent capability limits and action mediation (allow, rewrite, block) at the language-to-action boundary. To support comparative evaluations across stochastic seeds, models, and orchestration configurations, the paper defines trace-based metrics for task success, termination reliability, contract compliance, factuality indicators, containment rate, and governance outcome distributions. More broadly, the framework is intended as a common abstraction to support testing and evaluation of multi-agent LLM systems, and to facilitate reproducible comparison across orchestration designs and configurations.

研究の動機と目的

マルチエージェントAIオーケストレーションのシステムモデルと故障分類を定義する。
ランタイム検証のためのステップおよびトレース契約を備えたMessage-Action Traces (MAT) を導入する。
契約違反を露呈する予算付き対例探索としてストレステストを開発する。
サービス、取得、メモリインターフェースでの構造化故障注入を組み込み、封じ込めをテストする。
インターフェース境界での介入を通じて、個々のエージェント能力を制約し、動作を媒介するガバナンス機構を提示する。

提案手法

実行を証跡と契約 verdict を伴う Message-Action Traces (MAT) として表現する。
MAT レコード上にステップ契約を、プレフィックス上にトレース契約を定義して違反を検出する。
契約違反を引き起こす摂動を予算内で探索する対抗的ストレステストを定式化する（摂動のコストを最小化）。
外部インターフェース（サービス、取得、メモリ）で構造化故障を注入し、現実的な故障下での封じ込めを評価する。
実行時ガバナンスを課し、個々のエージェント能力セットとポリシーシールド（許可、書換え、ブロック）を用いてツール呼び出しを媒介する。

Figure 1: Pipeline overview of the assurance framework. Colors indicate the four layers and their roles. The system under test (SUT, green) is the deployed multi-agent LLM system : an orchestrator coordinating an agent pool, together with the runtime governance boundary L4 (blue). The diagram uses a

実験結果

リサーチクエスチョン

RQ1ランタイム検証のために契約を伴うトレースベースのシステムとしてマルチエージェントLLMの実行をどのようにモデル化できるか。
RQ2終了、安全性、事実性の違反を検出するために必要な契約と出所（プロヴェナンス）は何か。
RQ3摂動ベースのストレステストはコスト予算内で契約違反を露呈できるのか。
RQ4言語から行動への境界でのガバナンスをどのように定量化し、エージェントの自律性を制約するように適用できるか。

主な発見

契約、ストレステスト、故障注入、ガバナンスを組み合わせたエージェント型AIシステムの統合的フレームワークを提案・形式化した。
MAT レコードは最初の違反ステップの局所化とデバッグ・回帰テストのリプレイを可能にする。
予算付き摂動探索により、トレース契約を違反する最小コストのスケジュールを特定し、故障の露呈とデバッグを促進する。
インターフェース全体での構造化故障注入により、現実的な運用故障下での封じ込めを評価する。
ガバナンス機構は、契約遵守やラン越しの封じ込め率など、測定可能な成果を提供する（トレース上の指標を定義）。

Figure 2: Adversarial counterexample search as an inner–outer assurance loop. (1) Setup: fix a system configuration $\kappa$ (roles, tools, contracts, governance) and sample tasks $x\sim\mathcal{D}$ with stochastic seed $z$ . (2) Inner loop (search): an adversary selects bounded perturbations $\delt

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。