QUICK REVIEW

[论文解读] ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

Matt Deitke, Eli VanderBilt|arXiv (Cornell University)|Jun 14, 2022

Human Pose and Action Recognition被引用 77

一句话总结

ProcTHOR 通过过程化方法生成大规模的、具物理启用的交互式房屋用于 Embodied AI；在 10k 场景上的预训练在多个基准测试中实现了最先进的结果，并具有强大的 0-shot 转移能力。

ABSTRACT

Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks. We demonstrate the power and potential of ProcTHOR via a sample of 10,000 generated houses and a simple neural model. Models trained using only RGB images on ProcTHOR, with no explicit mapping and no human task supervision produce state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the presently running Habitat 2022, AI2-THOR Rearrangement 2022, and RoboTHOR challenges. We also demonstrate strong 0-shot results on these benchmarks, via pre-training on ProcTHOR with no fine-tuning on the downstream benchmark, often beating previous state-of-the-art systems that access the downstream training data.

研究动机与目标

通过大规模、多样化、交互式环境来推动 Embodied AI 的扩展。
实现多样且物理上合理的平面布置和资产的自动生成。
提供具有可配置照明和材质的完全交互场景，以实现稳健的训练。
证明简单的基于 RGB 的模型在使用大规模 ProcTHOR 数据时也能达到 SoTA。
开源 ProcTHOR 框架以推动 Embodied AI 研究。

提出的方法

基于房间规格过程生成完全交互、具物理能力的房屋。
用 1633 个资产，覆盖 108 个类别和 18 个语义资产组，来丰富平面布局的放置。
应用材质和照明随机化以模拟多样化外观和一天中不同时间。
启用对象状态和操作以支持导航、交互和操作任务。
使用简单的 CNN+GRU 架构（部分任务使用基于 CLIP 的变体），并在 AllenAct 框架中进行训练。
在六个 Embodied AI 基准上评估零-shot 和微调后的性能。

实验结果

研究问题

RQ1ProcTHOR 的大规模过程性环境是否能提升 Embodied AI 代理的泛化能力？
RQ2在零-shot 与微调后，单纯在 ProcTHOR 上训练的 RGB 模型是否能对下游基准测试具有有竞争力的迁移？
RQ3增加训练房屋数量如何影响导航和操作任务的性能？
RQ4过程性多样性（平面布局、资产、材质、照明）对基准分数有何影响？

主要发现

ProcTHOR 在六个 Embodied AI 基准的导航和操作方面实现了最先进的结果。
在零-shot 转移中，仅在 ProcTHOR 上训练的模型在若干基准上超越了此前的 SoTA。
在下游微调后，基于 ProcTHOR 的模型在 Habitat 2022 ObjectNav、AI2-THOR Rearrangement 和 RoboTHOR ObjectNav 上达到排行榜前列。
ArchitecTHOR 与 ProcTHOR 在多样化任务中展示出强大的 0-shot 和微调后的性能。
消融研究显示从 10 到 100 到 1K 再到 10K 场景的扩展带来收益。
ProcTHOR 支持大规模数据集和与训练数亿次步数兼容的快速渲染。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。