QUICK REVIEW

[論文レビュー] Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows

Joel Strickland, Arjun Vijeta|arXiv (Cornell University)|Mar 6, 2026

Scientific Computing and Data Management被引用数 0

ひとこと要約

本論文は、対話的意図と実行を分離するスキーマガード型オーケストレーションを提案し、科学的ワークフローにおけるエージェント式AIを分析。ED/CF軸に沿って20システムを評価し、柔軟性と決定論を両立する参照アーキテクチャを提案する。

ABSTRACT

Large language models (LLMs) can now translate a researcher's plain-language goal into executable computation, yet scientific workflows demand determinism, provenance, and governance that are difficult to guarantee when an LLM decides what runs. Semi-structured interviews with 18 experts across 10 industrial R&D stakeholders surface 2 competing requirements--deterministic, constrained execution and conversational flexibility without workflow rigidity--together with boundary properties (human-in-the-loop control and transparency) that any resolution must satisfy. We propose schema-gated orchestration as the resolving principle: the schema becomes a mandatory execution boundary at the composed-workflow level, so that nothing runs unless the complete action--including cross-step dependencies--validates against a machine-checkable specification. We operationalize the 2 requirements as execution determinism (ED) and conversational flexibility (CF), and use these axes to review 20 systems spanning 5 architectural groups along a validation-scope spectrum. Scores are assigned via a multi-model protocol--15 independent sessions across 3 LLM families--yielding substantial-to-near-perfect inter-model agreement (Krippendorff a=0.80 for ED and a=0.98 for CF), demonstrating that multi-model LLM scoring can serve as a reusable alternative to human expert panels for architectural assessment. The resulting landscape reveals an empirical Pareto front--no reviewed system achieves both high flexibility and high determinism--but a convergence zone emerges between the generative and workflow-centric extremes. We argue that a schema-gated architecture, separating conversational from execution authority, is positioned to decouple this trade-off, and distill 3 operational principles--clarification-before-execution, constrained plan-act orchestration, and tool-to-workflow-level gating--to guide adoption.

研究の動機と目的

AI駆動の科学的ワークフローにおいて、実行の決定論と対話的柔軟性をバランスさせる実務者要件を特定する。
既存システムを実行決定論（ED）と対話的柔軟性（CF）設計空間にマッピングする。
3つのLLMファミリ間でのアーキテクチャ評価の再現性（モデル間スコアリングの信頼性）を示す。
スキーマガード型オーケストレーションを、ED/CFのトレードオフに対する principled な解決として提案する。
参照アーキテクチャと実運用での採用を導く3つの運用原則を提示する。

提案手法

10の産業研究開発ステークホルダーの18名の専門家に半構造化インタビューを実施し、要件と境界特性を抽出する。
5つのアーキテクチャ群にわたる20の代表的なシステムをレビューし、EDとCFの軸で5段階の序数ルーブリックを用いて評価する。
3つのLLMファミリ（ChatGPT、Claude、Gemini）にわたる15回の独立したスコアリングセッションを実施し、モデル間の同意をKrippendorff’s αで評価する。
設計空間を分析して経験的なParetoフロントを明らかにし、パラダイム間の収束ゾーンを特定する。
スキーマ検証を個々のツール呼び出しから構成ワークフロー計画へ拡張することで、スキーマガード型オーケストレーションを設計原理として位置づけ、3つの運用原則と参照アーキテクチャを提案する。

実験結果

リサーチクエスチョン

RQ1AI駆動の科学的ワークフローにおいて、実行の決定論と対話的柔軟性の両立を達成するためのアーキテクチャ要件は何か。
RQ2現在のシステムはED/CFにどのように整合しており、生成的、ツール補助、スキーマガード、ワークフロー型などのパラダイム間でどのようなトレードオフが存在するか。
RQ3スキーマガード型オーケストレーションは、対話的権限と実行権限を切り離して再現性とガバナンスを改善できるか。
RQ4組み合わせられたワークフローの実装におけるスキーマガード型実行の実務的影響とアーキテクチャ的パターンは何か。

主な発見

検証されたEmpiricalトレードオフが存在する：レビュー対象のいずれのシステムも高い柔軟性と高い決定論の両方を同時に達成していない（パレート前線）。
15回のスコアリング実施を通じて、3つのLLMファミリ間でモデル間の高い同意が見られる：Krippendorff’s α = 0.80（ED）および0.98（CF）。
スキーマ検証を個々のツール呼び出しから構成ワークフロー計画へ拡張することで、対話的柔軟性を保ちつつ決定論的実行をよりよくサポートできる。
2つの運用ゾーンが現れる：理想に近いスキーマガード型グループ（IDs 8–9）、ワークフロー中心およびワークフロー+NLグループが、EDをより高く、CFを低くする方向へ収束。
3つの運用原則が明確化される：実行前の明確化、制約された計画–実行オーケストレーション、ツールからワークフローレベルへのゲーティング。
スキーマ検証レジストリを対話層と分離し、オーケストレーションコントローラを介してエンドツーエンドの出自保証を可能にする参照アーキテクチャを提案する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。