QUICK REVIEW

[论文解读] Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

Sangwoo Lee, Yu‐Jung Heo|arXiv (Cornell University)|Feb 12, 2018

Multimodal Machine Learning Applications被引用 27

一句话总结

本文提出了一种名为‘问题者心中的回答者’（Answerer in Questioner's Mind, AQM）的信息论框架，用于目标导向视觉对话。该框架通过概率化建模回答者的意图以最大化信息增益，从而实现最优问题选择。AQM在GuessWhat?!任务中，于10轮内达到78.72%的准确率，优于深度学习与强化学习基线方法，其选择的问题能有效减少对目标对象的不确定性。

ABSTRACT

Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take. To ask the adequate question, deep learning and reinforcement learning have been recently applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose "Answerer in Questioner's Mind" (AQM), a novel information theoretic algorithm for goal-oriented dialog. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer. The questioner figures out the answerer's intention via selecting a plausible question by explicitly calculating the information gain of the candidate intentions and possible answers to each question. We test our framework on two goal-oriented visual dialog tasks: "MNIST Counting Dialog" and "GuessWhat?!". In our experiments, AQM outperforms comparative algorithms by a large margin.

研究动机与目标

为解决基于深度学习与强化学习的目标导向对话系统中存在的低效与冗余问题。
通过心智理论建模回答者的可能回应，提升视觉对话中的问题选择能力。
开发一种通用且与模型无关的框架，通过信息论问题规划提升对话效率。
通过建模类人意图，提升对话智能体在人类交互中的泛化能力。
为分析与改进现有深度学习方法在目标导向对话中的表现，提供理论与实践工具。

提出的方法

AQM利用对回答者意图及回应分布的概率模型，计算候选问题的信息增益。
问题者通过评估每个问题在多大程度上能将可能答案的空间分割开来，选择能最大化信息增益的问题。
该框架依赖于对回答者意图的近似后验分布，从而无需使用循环神经网络来追踪对话历史。
AQM可与多种问题采样策略集成，包括从训练数据中提取问题，或通过序列到序列模型生成问题。
信息增益通过候选答案后验分布与先验分布之间的熵减少量来计算。
该方法进一步扩展为使用预训练问题生成器生成新问题，并采用束搜索进行候选问题选择。

实验结果

研究问题

RQ1如何在不依赖复杂RNN的情况下，使问题者在目标导向视觉对话中高效选择问题？
RQ2通过信息增益建模回答者意图，是否能提升对话性能，相比标准的深度学习与强化学习方法？
RQ3AQM的信息论方法在样本效率与准确率方面，相较于端到端学习有何差异？
RQ4AQM在多大程度上可用于解释或增强现有深度学习模型在对话系统中的表现？
RQ5AQM能否扩展至为未见过的图像生成语境相关的高质量问题？

主要发现

在GuessWhat?!任务中，AQM在3轮内达到63.63%的准确率，10轮内达到78.72%，显著优于深度监督学习（5轮内46.8%）与深度强化学习（4.1轮内52.3%）。
在MNIST Counting Dialog任务中，AQM表现优于基线方法，证实其在视觉对话之外也具备良好的泛化能力。
AQM-gen1Q变体（使用序列到序列模型生成问题）在2轮时达到51.07%准确率，略高于原始深度监督学习方法（46.8%）。
在5轮时，AQM-gen1Q达到70.74%准确率，略低于AQM-countQ-depA（72.89%），表明问题生成质量存在权衡。
AQM对回答者意图的后验分布与对比模型中RNN的隐藏状态存在显著相关性，暗示注意力机制与信念追踪之间存在理论关联。
AQM的目标函数与深度强化学习的目标函数一致，表明基于强化学习的训练隐式地近似了回答者的分布。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。