QUICK REVIEW

[论文解读] Draft-and-Target Sampling for Video Generation Policy

Qikang Zhang, Yingjie Lei|arXiv (Cornell University)|Mar 13, 2026

Reinforcement Learning in Robotics被引用 0

一句话总结

Draft-and-Target Sampling (DTS) 是一种无需训练的扩散推断框架，在单一模型内使用两条互补的去噪轨迹，加上 token 分块与渐进式接受，可在最小失去成功率的前提下将视频生成策略推断加速至多达 2.1x。

ABSTRACT

Video generation models have been used as a robot policy to predict the future states of executing a task conditioned on task description and observation. Previous works ignore their high computational cost and long inference time. To address this challenge, we propose Draft-and-Target Sampling, a novel diffusion inference paradigm for video generation policy that is training-free and can improve inference efficiency. We introduce a self-play denoising approach by utilizing two complementary denoising trajectories in a single model, draft sampling takes large steps to generate a global trajectory in a fast manner and target sampling takes small steps to verify it. To further speedup generation, we introduce token chunking and progressive acceptance strategy to reduce redundant computation. Experiments on three benchmarks show that our method can achieve up to 2.1x speedup and improve the efficiency of current state-of-the-art methods with minimal compromise to the success rate. Our code is available.

研究动机与目标

为具身智能体中的视频生成策略实现更快的实时推断提供动机。
引入一种无需训练单独草稿模型的无训练方法。
在机器人基准测试中在保持任务成功率的前提下降低推断时间。
提供（Token Chunking 与渐进式接受）以控制计算量与准确度的机制。

提出的方法

使用单个扩散模型同时执行草稿与目标去噪轨迹。
草稿采样以大步长生成粗略去噪标记以形成草稿序列。
目标采样以较小步长并行对每个草稿标记进行细化，形成对应的目标序列。
通过比较草稿与目标轨迹来验证并接受标记，如有需要从第一个被拒绝的标记重新开始。
引入 token chunking 以分块处理密集的去噪轨迹，以及渐进式接受策略以随时间放宽匹配阈值。

实验结果

研究问题

RQ1在不训练单独草稿模型的情况下，猜测解码概念是否可应用于基于扩散的视频生成策略？
RQ2token chunking 与渐进式接受是否在不显著损害成功率的前提下提升推断效率？
RQ3在标准机器人基准测试上使用 DTS 能实现哪些加速与准确率权衡？

主要发现

DTS 在三个视频生成策略基准上实现最高 2.1x 的加速。
在 iThor 上，DTS 将总体成功率提高到 29.15%，平均运行时间为 1.405s，而 AVDC-100 为 3.013s（2.14x 加速且成功率提升 2.05 个百分点）。
在 Meta-World，DTS 达到 41.2%–42.4% 的成功率，伴随显著的加速（如约 1.35x–1.60x），且与 DDIM-100 的结果非常接近。
在 Libero，DTS 产生大约 1.6x–2.0x 的加速，成功率变化温和，在不同分块长度下表现稳健。
token chunking（在长度 6 时效果最佳）在各基准测试中始终提供强劲的加速，并具有稳定或提升的成功率。
渐进式接受放宽了严格的标记匹配约束，从而提升效率且对策略性能无显著下降。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。