[论文解读] Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration
本文提出 imagine,一种内在动机驱动的强化学习架构,利用语言来想象分布外目标并引导探索,借助模块化、面向对象的表示以及在 Playground 环境中的社会语言反馈实现。
Developmental machine learning studies how artificial agents can model the way children learn open-ended repertoires of skills. Such agents need to create and represent goals, select which ones to pursue and learn to achieve them. Recent approaches have considered goal spaces that were either fixed and hand-defined or learned using generative models of states. This limited agents to sample goals within the distribution of known effects. We argue that the ability to imagine out-of-distribution goals is key to enable creative discoveries and open-ended learning. Children do so by leveraging the compositionality of language as a tool to imagine descriptions of outcomes they never experienced before, targeting them as goals during play. We introduce IMAGINE, an intrinsically motivated deep reinforcement learning architecture that models this ability. Such imaginative agents, like children, benefit from the guidance of a social peer who provides language descriptions. To take advantage of goal imagination, agents must be able to leverage these descriptions to interpret their imagined out-of-distribution goals. This generalization is made possible by modularity: a decomposition between learned goal-achievement reward function and policy relying on deep sets, gated attention and object-centered representations. We introduce the Playground environment and study how this form of goal imagination improves generalization and exploration over agents lacking this capacity. In addition, we identify the properties of goal imagination that enable these results and study the impacts of modularity and social interactions.
研究动机与目标
- 通过想象目标,在没有外部奖励的情况下激励自主代理学习开放式技能库。
- 通过组成性语言实现分布外目标生成,以驱动创造性探索。
- 研究社会语言引导与模块化架构如何支持目标解释和策略学习。
- 提供一个受控环境(Playground),以分析谓词、属性和对象类别上的泛化。
提出的方法
- 引入 imagine 架构,具有将自然语言目标映射到嵌入的语言编码器。
- 开发两个内部模型:一个目标实现奖励函数和一个目标条件策略。
- 使用一个目标生成器,将已知目标与想象目标混合,想象基于构造语法来组合新目标。
- 采用面向对象的模块化架构(带门控注意的 Deep Sets)以实现置换不变表示。
- 用 Hindsight Replay 训练,并使用共享语言编码器将描述转化为学习信号。
实验结果
研究问题
- RQ1使用语言进行目标想象如何影响对新状态和新语言描述目标的泛化?
- RQ2设想目标对环境探索的影响,特别是对象交互方面的影响?
- RQ3模块化架构和社会语言反馈如何影响从想象目标中学习的能力?
主要发现
- 与不进行想象的基线相比,目标想象显著提高了对测试集中未见目标的泛化。
- 代理在响应想象目标时展示出行为适应性,例如为植物浇水等行动调整。
- 想象提升探索,在测试场景中以目标导向交互(i2c)增加来衡量。
- 模块化(面向对象的 Deep Sets 结合门控注意)对于利用想象目标并实现优于扁平架构的泛化至关重要。
- 来自伙伴的描述性社会反馈在放宽反馈条件下也能实现有效的目标想象。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。