[论文解读] Imitating Interactive Intelligence
本论文提出通过模仿人类-人类数据、结合辅助损失和评估模型,在一个3D虚拟Playroom中训练互动智能体,以近似人类评估并实现对训练数据之外的泛化。
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.
研究动机与目标
- 将人工智能与人类式交互在一个扎根且互动的环境中统一起来。
- 通过模仿人类对交互的演示来开发大规模行为先验。
- 证明互动式训练能在超越监督行动学习的情况下改善智能体行为。
- 展示学习的智能体能泛化到超出训练经验的新的状态。
- 创建评估模型,使其判断与人类评估一致以评估新智能体。
提出的方法
- 使用一个基于3D Unity的Playroom,配备移动操纵器用于感知、行动与语言任务。
- 通过语言游戏收集大规模的人类设定者-求解者互动数据集,以训练和评估模仿学习者(≈610k 回合)。
- 使用包含连续鼠标查看和键盘控制的行动空间,由基于自回归策略、以多模态观测为条件来建模。
- 在一个ResNet为基础的视觉模块、一个多模态变换器、一个LSTM以及用于运动与语言输出的独立策略的框架下架构智能体。
- 将行为克隆设为基础模仿目标,并通过辅助损失(Language Matching 和 Object-in-View)来正则化表示。
- 讨论逆向强化学习在解决分布不匹配和提升从示范中学习的作用。
实验结果
研究问题
- RQ1大型人类行为模仿是否能在虚拟环境中产生智能、互动的智能体?
- RQ2辅助学习信号和行为先验是否能使模仿学习智能体超越纯BC?
- RQ3学习到的智能体对数据集中未显式看到的状态泛化程度如何?
- RQ4是否可以训练评估模型以符合人类判断,从而实现对新智能体的可扩展评估?
主要发现
- 互动训练加上辅助损失能将智能体行为提升到超越仅学习行动的监督水平。
- 智能体对训练数据中未显式出现的新状态和任务具有泛化能力。
- 从人类演示中训练得到的大规模行为先验有助于在互动中驱动接近人类的反应。
- 训练以预测人类判断的评估模型在新智能体评估中与人类评估高度一致。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。