[论文解读] Embodied Lifelong Learning for Task and Motion Planning
该论文把面向 embodied TAMP 的终身采样学习形式化,提出一系列嵌套扩散采样器的混合体,在线选择专业化模型与通用模型,以提升在终身任务序列中的规划性能。
A robot deployed in a home over long stretches of time faces a true lifelong learning problem. As it seeks to provide assistance to its users, the robot should leverage any accumulated experience to improve its own knowledge and proficiency. We formalize this setting with a novel formulation of lifelong learning for task and motion planning (TAMP), which endows our learner with the compositionality of TAMP systems. Exploiting the modularity of TAMP, we develop a mixture of generative models that produces candidate continuous parameters for a planner. Whereas most existing lifelong learning approaches determine a priori how data is shared across various models, our approach learns shared and non-shared models and determines which to use online during planning based on auxiliary tasks that serve as a proxy for each model's understanding of a state. Our method exhibits substantial improvements (over time and compared to baselines) in planning success on 2D and BEHAVIOR domains.
研究动机与目标
- 在真实的终身设定下形式化 TAMP 的终身学习。
- 利用 TAMP 的模块化特性,学习影响规划成功的连续参数的生成采样器。
- 开发一个生成模型混合体,通过辅助任务在线选择通用采样器与专用采样器。
- 展示在二维与 BEHAVIOR 域中,随着时间推移规划性能的提升。
提出的方法
- 将抽象动作的采样器表示为在状态条件下生成连续参数的扩散模型。
- 采用嵌套模型方法,将通用采样器与针对每个对象类型的专用采样器结合,数据稀缺时实现数据聚合。
- 训练辅助预测器,通过辅助信号 z 评估采样器的可靠性,形成混合分布,根据可靠性对采样器样本加权。
- 采用 SeSamE(先搜再采样)双层规划框架,在离散层生成骨架,在连续参数采样中进行细化。
- 使用回放/再训练策略与新旧数据简单混合,在终身数据上训练采样器,以对抗遗忘。
![Figure 1: The learning robot will face a sequence of diverse TAMP problems in a true lifelong setting. It will use its current models to solve each problem as efficiently as possible, and then use any collected data to improve those models for the future. Images captured from BEHAVIOR [ 1 ] .](https://ar5iv.labs.arxiv.org/html/2307.06870/assets/x1.png)
实验结果
研究问题
- RQ1扩散基采样器在终身设定下能否学习对 TAMP 有用的连续参数分布?
- RQ2专用采样器与通用采样器的混合是否在数据有限时提升规划效率?
主要发现
- 通过数据学习的扩散模型采样器产生的分布与观测到的成功与有效动作分布对齐。
- 嵌套采样器的混合体在基线方法上具有更好表现,尤其在低数据情形下。
- 终身评估显示该混合方法相对于基线和均匀采样,在累计解决的问题数量上显著提升。
- 在 BEHAVIOR 域中,终身学习者的表现优于手工设计的起始采样器,展示在现实域中的持续改进。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。