QUICK REVIEW

[论文解读] Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows

Joel Strickland, Arjun Vijeta|arXiv (Cornell University)|Mar 6, 2026

Scientific Computing and Data Management被引用 0

一句话总结

本论文提出基于模式门控的编排，以在科学工作流的智能体AI中将对话意图与执行分离，分析了沿ED/CF轴线的20个系统，并提出一个参考架构以实现灵活性与确定性兼具。

ABSTRACT

Large language models (LLMs) can now translate a researcher's plain-language goal into executable computation, yet scientific workflows demand determinism, provenance, and governance that are difficult to guarantee when an LLM decides what runs. Semi-structured interviews with 18 experts across 10 industrial R&D stakeholders surface 2 competing requirements--deterministic, constrained execution and conversational flexibility without workflow rigidity--together with boundary properties (human-in-the-loop control and transparency) that any resolution must satisfy. We propose schema-gated orchestration as the resolving principle: the schema becomes a mandatory execution boundary at the composed-workflow level, so that nothing runs unless the complete action--including cross-step dependencies--validates against a machine-checkable specification. We operationalize the 2 requirements as execution determinism (ED) and conversational flexibility (CF), and use these axes to review 20 systems spanning 5 architectural groups along a validation-scope spectrum. Scores are assigned via a multi-model protocol--15 independent sessions across 3 LLM families--yielding substantial-to-near-perfect inter-model agreement (Krippendorff a=0.80 for ED and a=0.98 for CF), demonstrating that multi-model LLM scoring can serve as a reusable alternative to human expert panels for architectural assessment. The resulting landscape reveals an empirical Pareto front--no reviewed system achieves both high flexibility and high determinism--but a convergence zone emerges between the generative and workflow-centric extremes. We argue that a schema-gated architecture, separating conversational from execution authority, is positioned to decouple this trade-off, and distill 3 operational principles--clarification-before-execution, constrained plan-act orchestration, and tool-to-workflow-level gating--to guide adoption.

研究动机与目标

Identify practitioner requirements balancing execution determinism and conversational flexibility in AI-driven scientific workflows.
Map existing systems onto an execution determinism (ED) and conversational flexibility (CF) design space.
Demonstrate inter-model scoring reliability for architectural assessment across LLM families.
Propose schema-gated orchestration as a principled resolution to the ED/CF trade-off.
Present a reference architecture and three operational principles to guide adoption in real-world workflows.

提出的方法

Perform semi-structured interviews with 18 experts across 10 industrial R&D stakeholders to elicit requirements and boundary properties.
Review 20 representative systems across five architectural groups, scoring them on ED and CF axes using a five-point ordinal rubric.
Conduct 15 independent scoring sessions across three LLM families (ChatGPT, Claude, Gemini) to assess inter-model agreement (Krippendorff’s α).
Analyze the design space to reveal an empirical Pareto front and identify convergence zones among paradigms.
Formulate schema-gated orchestration as a design principle with three operational tenets and outline a reference architecture with provenance guarantees.

实验结果

研究问题

RQ1What architectural requirements are needed to achieve both execution determinism and conversational flexibility in AI-driven scientific workflows?
RQ2How do current systems align on ED/CF, and what trade-offs exist across paradigms (generative, tool-augmented, schema-gated, workflow-based)?
RQ3Can schema-gated orchestration decouple conversational authority from execution authority to improve reproducibility and governance?
RQ4What are the practical implications and architectural patterns for implementing schema-gated execution across composed workflows?

主要发现

There is an empirical trade-off: no reviewed system achieves both high flexibility and high determinism (Pareto front).
There is substantial-to-near-perfect inter-model agreement (Krippendorff’s α = 0.80 for ED and 0.98 for CF) across 15 scoring runs over three LLM families.
Schema-gated orchestration, extending schema validation from individual tool calls to composed-workflow plans, can better support deterministic execution with conversational flexibility.
Two operational zones emerge: schema-gated group closest to ideal (IDs 8–9), workflow-centric and workflow+NL groups converging toward higher ED but lower CF.
Three operational principles are articulated: clarification-before-execution, constrained plan–act orchestration, and tool-to-workflow-level gating.
A reference architecture is proposed that separates a schema-validated registry from a conversational layer via an orchestration controller to enable end-to-end provenance.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。