Skip to main content
QUICK REVIEW

[论文解读] Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows

Joel Strickland, Arjun Vijeta|arXiv (Cornell University)|Mar 6, 2026
Scientific Computing and Data Management被引用 0
一句话总结

本论文提出基于模式门控的编排,以在科学工作流的智能体AI中将对话意图与执行分离,分析了沿ED/CF轴线的20个系统,并提出一个参考架构以实现灵活性与确定性兼具。

ABSTRACT

Large language models (LLMs) can now translate a researcher's plain-language goal into executable computation, yet scientific workflows demand determinism, provenance, and governance that are difficult to guarantee when an LLM decides what runs. Semi-structured interviews with 18 experts across 10 industrial R&D stakeholders surface 2 competing requirements--deterministic, constrained execution and conversational flexibility without workflow rigidity--together with boundary properties (human-in-the-loop control and transparency) that any resolution must satisfy. We propose schema-gated orchestration as the resolving principle: the schema becomes a mandatory execution boundary at the composed-workflow level, so that nothing runs unless the complete action--including cross-step dependencies--validates against a machine-checkable specification. We operationalize the 2 requirements as execution determinism (ED) and conversational flexibility (CF), and use these axes to review 20 systems spanning 5 architectural groups along a validation-scope spectrum. Scores are assigned via a multi-model protocol--15 independent sessions across 3 LLM families--yielding substantial-to-near-perfect inter-model agreement (Krippendorff a=0.80 for ED and a=0.98 for CF), demonstrating that multi-model LLM scoring can serve as a reusable alternative to human expert panels for architectural assessment. The resulting landscape reveals an empirical Pareto front--no reviewed system achieves both high flexibility and high determinism--but a convergence zone emerges between the generative and workflow-centric extremes. We argue that a schema-gated architecture, separating conversational from execution authority, is positioned to decouple this trade-off, and distill 3 operational principles--clarification-before-execution, constrained plan-act orchestration, and tool-to-workflow-level gating--to guide adoption.

研究动机与目标

  • Identify practitioner requirements balancing execution determinism and conversational flexibility in AI-driven scientific workflows.
  • Map existing systems onto an execution determinism (ED) and conversational flexibility (CF) design space.
  • Demonstrate inter-model scoring reliability for architectural assessment across LLM families.
  • Propose schema-gated orchestration as a principled resolution to the ED/CF trade-off.
  • Present a reference architecture and three operational principles to guide adoption in real-world workflows.

提出的方法

  • Perform semi-structured interviews with 18 experts across 10 industrial R&D stakeholders to elicit requirements and boundary properties.
  • Review 20 representative systems across five architectural groups, scoring them on ED and CF axes using a five-point ordinal rubric.
  • Conduct 15 independent scoring sessions across three LLM families (ChatGPT, Claude, Gemini) to assess inter-model agreement (Krippendorff’s α).
  • Analyze the design space to reveal an empirical Pareto front and identify convergence zones among paradigms.
  • Formulate schema-gated orchestration as a design principle with three operational tenets and outline a reference architecture with provenance guarantees.

实验结果

研究问题

  • RQ1What architectural requirements are needed to achieve both execution determinism and conversational flexibility in AI-driven scientific workflows?
  • RQ2How do current systems align on ED/CF, and what trade-offs exist across paradigms (generative, tool-augmented, schema-gated, workflow-based)?
  • RQ3Can schema-gated orchestration decouple conversational authority from execution authority to improve reproducibility and governance?
  • RQ4What are the practical implications and architectural patterns for implementing schema-gated execution across composed workflows?

主要发现

  • There is an empirical trade-off: no reviewed system achieves both high flexibility and high determinism (Pareto front).
  • There is substantial-to-near-perfect inter-model agreement (Krippendorff’s α = 0.80 for ED and 0.98 for CF) across 15 scoring runs over three LLM families.
  • Schema-gated orchestration, extending schema validation from individual tool calls to composed-workflow plans, can better support deterministic execution with conversational flexibility.
  • Two operational zones emerge: schema-gated group closest to ideal (IDs 8–9), workflow-centric and workflow+NL groups converging toward higher ED but lower CF.
  • Three operational principles are articulated: clarification-before-execution, constrained plan–act orchestration, and tool-to-workflow-level gating.
  • A reference architecture is proposed that separates a schema-validated registry from a conversational layer via an orchestration controller to enable end-to-end provenance.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。