QUICK REVIEW

[论文解读] Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols

Serhii Havrylov, Ivan Titov|arXiv (Cornell University)|May 31, 2017

Language and cultural evolution被引用 153

一句话总结

本文训练两个神经代理在参照任务中通过离散符号序列进行沟通，展示 straight-through Gumbel-softmax 能更快收敛、形成更丰富、具组合性的协议，同时探索自然语言 grounding。

ABSTRACT

Learning to communicate through interaction, rather than relying on explicit supervision, is often considered a prerequisite for developing a general AI. We study a setting where two agents engage in playing a referential game and, from scratch, develop a communication protocol necessary to succeed in this game. Unlike previous work, we require that messages they exchange, both at train and test time, are in the form of a language (i.e. sequences of discrete symbols). We compare a reinforcement learning approach and one using a differentiable relaxation (straight-through Gumbel-softmax estimator) and observe that the latter is much faster to converge and it results in more effective protocols. Interestingly, we also observe that the protocol we induce by optimizing the communication success exhibits a degree of compositionality and variability (i.e. the same information can be phrased in different ways), both properties characteristic of natural languages. As the ultimate goal is to ensure that communication is accomplished in natural language, we also perform experiments where we inject prior information about natural language into our model and study properties of the resulting protocol.

研究动机与目标

通过 interaction 而非 supervision 来激发沟通学习。
展示语言作为离散符号序列在参照任务中的涌现。
比较训练方法（REINFORCE vs straight-through Gumbel-softmax）在效率和协议质量上的差异。
研究所诱发语言的属性，包括组成性和类似改述的变异性。
探索涌现语言在自然语言中的间接与直接 grounding。

提出的方法

代理为 LSTMs（发送方 S 与接收方 R），在目标图像和消息 m 上工作，m 以来自词汇表 V 的标记序列形式产生，长度上限为 L。
消息是离散的；梯度通过 REINFORCE 或使用带直-through 的 Gumbel-softmax (GS) 的可微弛化来估计，在训练中。
GS-ST 通过在前向传播中离散化、在反向传播中使用连续放松来实现端到端微分。
损失促使接收方基于消息在干扰项中识别目标图像。
探索两种 grounding 策略：通过自然语言模型对间接 grounding 的 KL(qφ(m|t) || pω(m))，以及通过图像描述监督的直接 grounding。
Gumbel-softmax 的温度在每一步学习以稳定训练（τ(hs_i)），并由一个学习得到的逆温度函数影响。

实验结果

研究问题

RQ1两代理在一个参照任务中从零开始能否发展出有意义的离散符号通信协议？
RQ2straight-through Gumbel-softmax 是否比 REINFORCE 更快、更有效地学习离散语言协议？
RQ3涌现的协议是否呈现组成性和类似自然语言的改述变异？
RQ4将涌现语言在自然语言中的 grounding（间接或直接）是否提高可解释性或与人类语言特征对齐？

主要发现

straight-through Gumbel-softmax 在学习参照游戏中的符号序列协议方面比 REINFORCE 收敛更快。
更长的消息（更高 L）有助于更快收敛，并对同一内容产生更冗余的（同义/改述式）编码。
所诱导的协议呈现出类似层次结构的编码，并且同一语义内容具有多种 paraphrase。
grounding 方法（间接 KL 正则化与可选的描述生成损失）可以使涌现沟通与自然语言统计对齐并提升可解释性。
与自然语言 grounding 相比，具 grounding 的协议在实现相近的通信成功率的同时展现不同的 omission 分数，表明在内容词与功能词的区分上存在部分对齐。
ST-GS 梯度方向在该任务中表现为伪梯度，为优化提供可靠的引导。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。