[论文解读] Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control
本文提出了可操控策略(Steerable Policies),是一类低级VLAs,能够接受跨抽象层次的多样化引导指令(任务、子任务、运动、夹具轨迹、点等),并演示高级别的具身推理与上下文学习VLMs如何控制它们以提升泛化能力和长时机器人任务表现。
Pretrained vision-language models (VLMs) can make semantic and visual inferences across diverse settings, providing valuable common-sense priors for robotic control. However, effectively grounding this knowledge in robot behaviors remains an open challenge. Prior methods often employ a hierarchical approach where VLMs reason over high-level commands to be executed by separate low-level policies, e.g., vision-language-action models (VLAs). The interface between VLMs and VLAs is usually natural language task instructions, which fundamentally limits how much VLM reasoning can steer low-level behavior. We thus introduce Steerable Policies: VLAs trained on rich synthetic commands at various levels of abstraction, like subtasks, motions, and grounded pixel coordinates. By improving low-level controllability, Steerable Policies can unlock pretrained knowledge in VLMs, enabling improved task generalization. We demonstrate this benefit by controlling our Steerable Policies with both a learned high-level embodied reasoner and an off-the-shelf VLM prompted to reason over command abstractions via in-context learning. Across extensive real-world manipulation experiments, these two novel methods outperform prior embodied reasoning VLAs and VLM-based hierarchical baselines, including on challenging generalization and long-horizon tasks. Website: steerable-policies.github.io
研究动机与目标
- 把可操控性作为将VLM知识绑定到机器人策略的关键瓶颈进行动机阐述与定义。
- 开发可操控策略(VLAs),它们接受用于操控机器人行为的多层抽象。
- 展示高级具身推理与上下文学习VLM如何控制可操控策略,以提升泛化与长时任务性能。
- Demonstrate scalable generation of synthetic steering commands to train versatile policies.
提出的方法
- 训练可操控策略以遵循广泛的引导指令谱,包括任务级、子任务级、原子运动、夹具轨迹、点以及组合指令。
- 通过一个从机器人轨迹中提取具地 Features、Subtasks与Prompts的流水线,自动大规模生成引导指令。
- 将可操控策略与两种高级VLM控制方法集成: (i) 经过微调的具身推理模块产生推理与引导指令;(ii) 通过上下文学习的VLM选择命令抽象来引导策略。
- 在真实世界的Bridge WidowX操作任务上,在分布内、运动、空间及语义泛化维度进行评估,并探索长时任务。

实验结果
研究问题
- RQ1跨越多种抽象的引导指令是否会在可操控策略中诱发可组合性与泛化行为?
- RQ2高级具身推理模型如何利用训练数据在控制可操控策略时实现泛化?
- RQ3现成的VLM是否能够利用上下文学习来选择抽象指令以提升长时机器人任务?
主要发现
- 一个具备无限制引导指令的人类神谕几乎可以实现所有任务(Bridge任务的成功率约为100%)。
- 单一的引导风格并非普遍最佳;一组抽象的谱系能够互补优势并提升性能。
- 经过微调的具身推理结合可操控策略在包括OpenVLA和ECoT变体的基线上表现更好,尤其在运动与语义泛化方面。
- 现成的VLM通过上下文推理可以有效选择抽象指令,优于SayCan等基线以及标准的OpenVLA。
- 上下文学习能够基于场景理解与任务进展实现纠错性引导和动态抽象选择。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。