QUICK REVIEW

[论文解读] Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning

Brian Ichter, Pierre Sermanet|arXiv (Cornell University)|Oct 13, 2020

AI-based Problem Solving and Planning被引用 2

一句话总结

BELT 提出了一种混合规划框架，结合了受 RRT 启发的树搜索与任务条件化的、学习得到的局部策略，以实现在高维、复杂环境中的长时程、顺序性任务规划。通过引入任务条件化的动力学模型以实现时间上的延展，该方法在长时程规划中实现了鲁棒且样本高效的性能，在复杂场景下优于纯学习方法或经典规划方法。

ABSTRACT

Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces with complex dynamics. Classical motion planning algorithms, such as rapidly-exploring random trees, are capable of efficiently exploring large state spaces and computing long-horizon, sequential plans. However, these algorithms are generally challenged with complex, stochastic, and high-dimensional state spaces as well as in the presence of narrow passages, which naturally emerge in tasks that interact with the environment. Machine learning offers a promising solution for its ability to learn general policies that can handle complex interactions and high-dimensional observations. However, these policies are generally limited in horizon length. Our approach, Broadly-Exploring, Local-policy Trees (BELT), merges these two approaches to leverage the strengths of both through a task-conditioned, model-based tree search. BELT uses an RRT-inspired tree search to efficiently explore the state space. Locally, the exploration is guided by a task-conditioned, learned policy capable of performing general short-horizon tasks. This task space can be quite general and abstract; its only requirements are to be sampleable and to well-cover the space of useful tasks. This search is aided by a task-conditioned model that temporally extends dynamics propagation to allow long-horizon search and sequential reasoning over tasks. BELT is demonstrated experimentally to be able to plan long-horizon, sequential trajectories with a goal conditioned policy and generate plans that are robust.

研究动机与目标

解决真实机器人任务中常见的高维、随机且复杂的状态空间下的长时程规划挑战。
克服经典运动规划算法（如 RRT）在处理狭窄通道和复杂动力学时的局限性。
利用学习策略的泛化能力，同时保持基于树的搜索的探索效率。
通过动力学传播的模型化扩展，实现对抽象、高层级任务的顺序推理。
开发一种可扩展、鲁棒的规划框架，兼顾样本效率与长时程轨迹生成能力。

提出的方法

采用受 RRT 启发的树搜索广泛探索状态空间，确保在高维和复杂环境中仍能实现覆盖。
使用任务条件化的、学习得到的策略引导局部探索，实现在采样状态附近的高效导航。
引入任务条件化的动力学模型，以在长时程内传播状态转移，支持顺序推理。
将策略和动力学模型基于抽象的、可采样的任务嵌入进行条件化，以实现对多样化长时程任务的泛化。
将学习到的策略和模型整合到树扩展过程中，实现高效探索与智能局部规划的结合。
采用分层规划策略，其中全局树搜索识别有希望的路径，而局部策略对轨迹段进行细化。

实验结果

研究问题

RQ1结合基于树的探索与学习的局部策略的混合方法，是否能在复杂、高维环境中实现鲁棒的长时程规划？
RQ2任务条件化的动力学模型在多大程度上能够将短时程策略滚动扩展，以支持长轨迹上的顺序推理？
RQ3与固定或低层次动作空间相比，使用可采样的抽象任务空间在多大程度上提升了泛化能力和规划效率？
RQ4在存在狭窄通道和复杂动力学的环境中，BELT 的表现如何，而这些环境往往是经典 RRT 失败的原因？
RQ5将学习策略集成到基于模型的树搜索中，是否能在保持样本效率的同时实现长时程规划？

主要发现

BELT 在具有复杂动力学的高维环境中成功生成了长时程、顺序性的轨迹，表现出对复杂状态空间结构的鲁棒性。
任务条件化动力学模型的引入，有效实现了策略滚动的时序延展，支持在长时程内保持连贯的顺序推理。
使用可采样的抽象任务空间使该框架能够在无需任务特定微调的情况下泛化到多样化任务。
在长时程任务上，BELT 在成功率和样本效率方面均优于纯 RRT 方法和纯端到端学习规划方法。
该框架在存在狭窄通道的环境中实现了可靠的规划，而标准 RRT 通常因探索不足而失败。
实证结果证实，广泛树探索与局部策略引导的结合，能够实现更快的收敛速度和更高质量的规划结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。