QUICK REVIEW

[论文解读] Self-Correcting VLA: Online Action Refinement via Sparse World Imagination

Chenyv Liu, Wentao Tan|arXiv (Cornell University)|Feb 25, 2026

Reinforcement Learning in Robotics被引用 0

一句话总结

SC-VLA 引入稀疏的世界想象和在线动作 refinement 来实现视觉-语言-动作的控制，在 ManiSkill 基准测试和现实世界 ARX5 上达到最先进的任务吞吐量和更高的成功率，同时步骤更少。

ABSTRACT

Standard vision-language-action (VLA) models rely on fitting statistical data priors, limiting their robust understanding of underlying physical dynamics. Reinforcement learning enhances physical grounding through exploration yet typically relies on external reward signals that remain isolated from the agent's internal states. World action models have emerged as a promising paradigm that integrates imagination and control to enable predictive planning. However, they rely on implicit context modeling, lacking explicit mechanisms for self-improvement. To solve these problems, we propose Self-Correcting VLA (SC-VLA), which achieve self-improvement by intrinsically guiding action refinement through sparse imagination. We first design sparse world imagination by integrating auxiliary predictive heads to forecast current task progress and future trajectory trends, thereby constraining the policy to encode short-term physical evolution. Then we introduce the online action refinement module to reshape progress-dependent dense rewards, adjusting trajectory orientation based on the predicted sparse future states. Evaluations on challenging robot manipulation tasks from simulation benchmarks and real-world settings demonstrate that SC-VLA achieve state-of-the-art performance, yielding the highest task throughput with 16% fewer steps and a 9% higher success rate than the best-performing baselines, alongside a 14% gain in real-world experiments. Code is available at https://github.com/Kisaragi0/SC-VLA.

研究动机与目标

在 VLA 系统中激发对静态先验之外的稳健物理解理解。
引入稀疏世界想象以在行动生成前约束短期物理演化。
开发在线动作 refinement，通过想象的未来状态重新塑造密集奖励。
利用内生的想象基础信号，消除对外部奖励模型的依赖。
在仿真与现实世界的机器人操作任务上展示优越性能。

提出的方法

以条件流匹配作为连续动作生成的基础策略。
在输入中增加预测任务进展和短期状态变化的稀疏世界想象目标。
训练辅助头以预测进展 p_t 和相对状态变化 Δs_t，使用均方误差损失(L_prog, L_Δs)。
在基础策略之上集成残差强化学习模块 (π_res) 以实现在线动作 refinement。
从预测的未来状态构建密集引导奖励，并对任务进展进行动态权重调度以动态加权预测引导。
在基础策略和残差策略上均采用 SAC 以实现稳定优化。

实验结果

研究问题

RQ1SC-VLA 能否通过稀疏世界想象和残差模块在复杂操作任务中改进流式匹配策略的成功率？
RQ2由稀疏世界想象和动态权重调度构建的密集奖励是否在稀疏奖励条件下提升探索效率和吞吐量？
RQ3每个想象组件（进展、状态）对性能的贡献是什么？
RQ4SC-VLA 能否稳定迁移到真实机器人系统，并在扰动下保持鲁棒性？

主要发现

SC-VLA 在具有挑战性的操作任务上实现了最先进的性能，获得最高任务吞吐量和更高的成功率。
在 ManiSkill 上，SC-VLA（SPI、OAR）达到最佳性能，较基线有显著提升（例如在某些预训练模型下 PegInsertion 的成功率最高可提升约 28%）。
SC-VLA 在所有评估的方法中获得了最短的平均完成长度（成功回合的平均步数为 157 步）。
在现实世界的 ARX5 实验中，SC-VLA（SPI）的平均成功率达到 70%，分别比 DP 和 GR00T N1.5 高出 43% 和 14%。
消融研究显示进展引导和状态引导对整体性能的重要性，以及稀疏想象奖励显著促进复杂任务的探索。
动态权重调度对于在早期预测引导与后期自主微调之间取得平衡至关重要。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。