Skip to main content
QUICK REVIEW

[论文解读] DySL-VLA: Efficient Vision-Language-Action Model Inference via Dynamic-Static Layer-Skipping for Robot Manipulation

Zebin Yang, Yijiahao Qi|arXiv (Cornell University)|Feb 26, 2026
Multimodal Machine Learning Applications被引用 0
一句话总结

DySL-VLA 通过动态跳过非关键层来加速 Vision-Language-Action 模型在机器人操作中的推理,同时保留重要行动,利用前后指导和跳跃感知蒸馏。

ABSTRACT

Vision-Language-Action (VLA) models have shown remarkable success in robotic tasks like manipulation by fusing a language model's reasoning with a vision model's 3D understanding. However, their high computational cost remains a major obstacle for real-world applications that require real-time performance. We observe that the actions within a task have varying levels of importance: critical steps demand high precision, while less important ones can tolerate more variance. Leveraging this insight, we propose DySL-VLA, a novel framework that addresses computational cost by dynamically skipping VLA layers based on each action's importance. DySL-VLA categorizes its layers into two types: informative layers, which are consistently executed, and incremental layers, which can be selectively skipped. To intelligently skip layers without sacrificing accuracy, we invent a prior-post skipping guidance mechanism to determine when to initiate layer-skipping. We also propose a skip-aware two-stage knowledge distillation algorithm to efficiently train a standard VLA into a DySL-VLA. Our experiments indicate that DySL-VLA achieves 2.1% improvement in success length over Deer-VLA on the Calvin dataset, while simultaneously reducing trainable parameters by a factor of 85.7 and providing a 3.75x speedup relative to the RoboFlamingo baseline at iso-accuracy. Our code is available on https://github.com/PKU-SEC-Lab/DYSL_VLA.

研究动机与目标

  • 在不牺牲关键动作准确性的前提下,推动在 VLA 模型中减少计算
  • 识别 VLA 预测中层的重要性以及动作重要性的变异性
  • 提出动态-静态层跳过以保留信息丰富的层并跳过其他层
  • 引入前后跳过指导以确定何时跳过层
  • 开发具备跳过感知的两阶段知识蒸馏以训练轻量级跳过组件

提出的方法

  • 将 VLA 层分类为静态(信息丰富)与动态(可跳过),以最大化加速并最小化信息损失
  • 在跳过前进行预测并在跳过后进行验证以决定并验证跳过决策
  • 基于动作连续性引入前后跳过指导以指引何时发生跳过
  • 提出具备跳过感知的两阶段知识蒸馏,先训练适配器以总结动态层,然后一起训练控制器和适配器
  • 仅训练轻量级的跳过控制器与适配器,冻结大语言模型骨干以降低训练成本
Figure 1. Different actions in robot manipulation have different importance. We show an example when the robot is performing task “Grasp the black cup and drop it into basket”. (a) shows the task completion rates when adding noise with different magnitudes to VLA model weights at different action st
Figure 1. Different actions in robot manipulation have different importance. We show an example when the robot is performing task “Grasp the black cup and drop it into basket”. (a) shows the task completion rates when adding noise with different magnitudes to VLA model weights at different action st

实验结果

研究问题

  • RQ1如何将层跳过针对 VLA 模型中的动作重要性进行定制,以在不牺牲关键任务动作的前提下最大化加速
  • RQ2动态-静态层跳过是否能够在显著降低推理延迟和训练成本的同时维持准确性
  • RQ3哪些机制(跳过前、跳过后、轨迹连续性)能有效指导 VLA 规划中的跳过决策
  • RQ4跳过感知的两阶段蒸馏是否改善训练收敛性并在不同数据集上保持泛化性

主要发现

  • DySL-VLA 在 Calvin D→D 上相对于 DeeR-VLA 在成功长度上实现 2.1% 的平均提升
  • DySL-VLA 将可训练参数减少 85.7×,并将训练步骤降低 13.7×
  • 在等精度的条件下,DySL-VLA 相对 RoboFlamingo 的延迟降低达到最高 3.75×
  • 在评估数据集上,DySL-VLA 将平均成功长度提高 54.5% 相对于 FlexiDepth
  • 消融研究表明,跳过前预测、跳过后验证和动态-静态跳过对维持准确性同时加速推理至关重要
Figure 2. VLA model architecture.
Figure 2. VLA model architecture.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。