[论文解读] Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows
本论文提出基于模式门控的编排,以在科学工作流的智能体AI中将对话意图与执行分离,分析了沿ED/CF轴线的20个系统,并提出一个参考架构以实现灵活性与确定性兼具。
Large language models (LLMs) can now translate a researcher's plain-language goal into executable computation, yet scientific workflows demand determinism, provenance, and governance that are difficult to guarantee when an LLM decides what runs. Semi-structured interviews with 18 experts across 10 industrial R&D stakeholders surface 2 competing requirements--deterministic, constrained execution and conversational flexibility without workflow rigidity--together with boundary properties (human-in-the-loop control and transparency) that any resolution must satisfy. We propose schema-gated orchestration as the resolving principle: the schema becomes a mandatory execution boundary at the composed-workflow level, so that nothing runs unless the complete action--including cross-step dependencies--validates against a machine-checkable specification. We operationalize the 2 requirements as execution determinism (ED) and conversational flexibility (CF), and use these axes to review 20 systems spanning 5 architectural groups along a validation-scope spectrum. Scores are assigned via a multi-model protocol--15 independent sessions across 3 LLM families--yielding substantial-to-near-perfect inter-model agreement (Krippendorff a=0.80 for ED and a=0.98 for CF), demonstrating that multi-model LLM scoring can serve as a reusable alternative to human expert panels for architectural assessment. The resulting landscape reveals an empirical Pareto front--no reviewed system achieves both high flexibility and high determinism--but a convergence zone emerges between the generative and workflow-centric extremes. We argue that a schema-gated architecture, separating conversational from execution authority, is positioned to decouple this trade-off, and distill 3 operational principles--clarification-before-execution, constrained plan-act orchestration, and tool-to-workflow-level gating--to guide adoption.
研究动机与目标
- Identify practitioner requirements balancing execution determinism and conversational flexibility in AI-driven scientific workflows.
- Map existing systems onto an execution determinism (ED) and conversational flexibility (CF) design space.
- Demonstrate inter-model scoring reliability for architectural assessment across LLM families.
- Propose schema-gated orchestration as a principled resolution to the ED/CF trade-off.
- Present a reference architecture and three operational principles to guide adoption in real-world workflows.
提出的方法
- Perform semi-structured interviews with 18 experts across 10 industrial R&D stakeholders to elicit requirements and boundary properties.
- Review 20 representative systems across five architectural groups, scoring them on ED and CF axes using a five-point ordinal rubric.
- Conduct 15 independent scoring sessions across three LLM families (ChatGPT, Claude, Gemini) to assess inter-model agreement (Krippendorff’s α).
- Analyze the design space to reveal an empirical Pareto front and identify convergence zones among paradigms.
- Formulate schema-gated orchestration as a design principle with three operational tenets and outline a reference architecture with provenance guarantees.
实验结果
研究问题
- RQ1What architectural requirements are needed to achieve both execution determinism and conversational flexibility in AI-driven scientific workflows?
- RQ2How do current systems align on ED/CF, and what trade-offs exist across paradigms (generative, tool-augmented, schema-gated, workflow-based)?
- RQ3Can schema-gated orchestration decouple conversational authority from execution authority to improve reproducibility and governance?
- RQ4What are the practical implications and architectural patterns for implementing schema-gated execution across composed workflows?
主要发现
- There is an empirical trade-off: no reviewed system achieves both high flexibility and high determinism (Pareto front).
- There is substantial-to-near-perfect inter-model agreement (Krippendorff’s α = 0.80 for ED and 0.98 for CF) across 15 scoring runs over three LLM families.
- Schema-gated orchestration, extending schema validation from individual tool calls to composed-workflow plans, can better support deterministic execution with conversational flexibility.
- Two operational zones emerge: schema-gated group closest to ideal (IDs 8–9), workflow-centric and workflow+NL groups converging toward higher ED but lower CF.
- Three operational principles are articulated: clarification-before-execution, constrained plan–act orchestration, and tool-to-workflow-level gating.
- A reference architecture is proposed that separates a schema-validated registry from a conversational layer via an orchestration controller to enable end-to-end provenance.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。