Skip to main content
QUICK REVIEW

[论文解读] SkillOrchestra: Learning to Route Agents via Skill Transfer

Jiayu Wang, Yifei Ming|arXiv (Cornell University)|Feb 23, 2026
Software-Defined Networks and 5G被引用 0
一句话总结

SkillOrchestra 学习一个可重复使用的技能手册来指导模式级编排和技能驱动的智能体路由,在成本更低的同时实现更高的准确性,比基于 RL 的基线表现更优,并实现跨编排器骨干的可迁移性。

ABSTRACT

Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.

研究动机与目标

  • 在多轮智能体系统中激发对细粒度、具技能感知的编排需求。
  • 引入一个捕捉模式级洞见、细粒度技能与智能体画像的技能手册。
  • 展示基于技能的路由如何实现状态条件化、成本感知的决策并降低路由崩溃。
  • 展示数据高效学习与在骨干与工具池之间的编排知识可迁移性。
  • 提供通过 Pareto 验证来为编排器特定的手册粒度选择提供指南。

提出的方法

  • 将技能定义为与操作模式和场景相关的可重复使用的能力抽象。
  • 将技能手册构建为一张图,包含模式级路由洞见、技能注册表和智能体画像。
  • 通过执行轨迹的成功与失败差异来推断缺失能力,从而从中学习手册。
  • 在每种模式下估计智能体能力(每项技能的概率)和成本,以实现成本感知的路由。
  • 通过 Pareto 最优验证选择一个编排器特定的手册子集,以在表达力与决策可靠性之间取得平衡。
  • 运行时路由利用手册进行模式选择和技能驱动的智能体路由,在性能与成本之间取得权衡。
Figure 1 : Performance-cost tradoffs in multi-turn model routing (left) and agent orchestration (right). SkillOrchestra and SkillOrchestra+ lie on the Pareto frontier, with higher accuracy at lower cost than all baselines.
Figure 1 : Performance-cost tradoffs in multi-turn model routing (left) and agent orchestration (right). SkillOrchestra and SkillOrchestra+ lie on the Pareto frontier, with higher accuracy at lower cost than all baselines.

实验结果

研究问题

  • RQ1RQ1: 学到的技能手册是否在端到端准确性方面优于启发式、判别和基于 RL 的方法?
  • RQ2RQ2: 基于技能的编排方法是否在性能-成本折衷方面表现更优?
  • RQ3RQ3: 基于技能的路由是否减少路由崩溃并促使模型利用的平衡?
  • RQ4RQ4: 是否可以在不重新训练的情况下将技能手册在不同编排器骨干之间迁移?
  • RQ5RQ5: Skill Handboook 的不同组成部分如何影响性能与成本效率?

主要发现

  • SkillOrchestra 在 QA 基准测试上超过最强的基于 RL 的编排器,准确性最高可提升 22.5 个百分点。
  • 相比 Router-R1 和 ToolOrchestra,成本降低约 2 倍,同时保持或提高准确性。
  • 基于技能的路由实现了更均衡的模型利用,缓解了端到端 RL 方法常见的路由崩溃问题。
  • Skill Handbok 展现了在不重新训练的情况下跨编排器骨干的可迁移性,并支持 Pareto 效率的性能-成本折衷。
  • 通过 Pareto 验证的、编排器特定的手册选择确保了合适的粒度,维持决策可靠性。
Figure 2 : Comparison of model routing and agent orchestration approaches. (Left) Model routing performs static, query-level model selection without dynamic mode or tool reasoning. (Middle) Direct agent orchestration learns routing end-to-end with implicit capability modeling and is prone to routing
Figure 2 : Comparison of model routing and agent orchestration approaches. (Left) Model routing performs static, query-level model selection without dynamic mode or tool reasoning. (Middle) Direct agent orchestration learns routing end-to-end with implicit capability modeling and is prone to routing

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。