QUICK REVIEW

[论文解读] Hierarchical Recurrent Attention Network for Response Generation

Xing Chen, Wei Wu|arXiv (Cornell University)|Jan 25, 2017

Topic Modeling被引用 116

一句话总结

HRAN 引入用于多轮应答生成的分层词级和话语级注意力，在困惑度和人类判断方面优于 S2SA、HRED 和 VHRED。

ABSTRACT

We study multi-turn response generation in chatbots where a response is generated according to a conversation context. Existing work has modeled the hierarchy of the context, but does not pay enough attention to the fact that words and utterances in the context are differentially important. As a result, they may lose important information in context and generate irrelevant responses. We propose a hierarchical recurrent attention network (HRAN) to model both aspects in a unified framework. In HRAN, a hierarchical attention mechanism attends to important parts within and among utterances with word level attention and utterance level attention respectively. With the word level attention, hidden vectors of a word level encoder are synthesized as utterance vectors and fed to an utterance level encoder to construct hidden representations of the context. The hidden vectors of the context are then processed by the utterance level attention and formed as context vectors for decoding the response. Empirical studies on both automatic evaluation and human judgment show that HRAN can significantly outperform state-of-the-art models for multi-turn response generation.

研究动机与目标

使用对话上下文来解决开放领域的多轮应答生成。
建模上下文的层级结构（话语内的词语、以及话语序列中的话语）以及上下文要素的差异化重要性。
通过使用分层注意力在生成过程中选择重要的词语和话语，提升应答的相关性和连贯性。
通过自动指标和人工评估展示相对于最先进基线的经验性提升。

提出的方法

用双向门控循环单元对每个话语进行编码，生成词级隐藏向量。
计算依赖解码器状态和话语上下文的词级注意力，以形成话语向量。
用话语级BRU对话语向量序列进行编码，以产生上下文表示。
应用话语级注意力，将上下文汇总为每个解码步骤的上下文向量。
使用基于GRU的语言模型，依据上下文向量进行解码，并采用束搜寻进行生成。
通过最大化真实回应的对数似然进行训练。

实验结果

研究问题

RQ1分层的词级和话语级注意力是否能提升多轮应答生成的相关性和连贯性？
RQ2联合建模上下文层级和部分级重要性是否能带来相对于现有分层模型（HRED、VHRED）和非分层基线的可衡量提升？
RQ3与最先进方法相比，HRAN 在自动困惑度指标和人类判断上的表现如何？
RQ4注意力可视化能提供哪些关于哪些词语和话语影响生成的洞察？

主要发现

模型	验证困惑度	测试困惑度
S2SA	43.679	44.508
HRED	46.279	47.467
VHRED	44.548	45.484
HRAN	40.257	41.138

与 S2SA、HRED 和 VHRED 相比，HRAN 在验证集和测试集上均实现了最低的困惑度。
验证集困惑度：S2SA 43.679，HRED 46.279，VHRED 44.548，HRAN 40.257。
测试集困惑度：S2SA 44.508，HRED 47.467，VHRED 45.484，HRAN 41.138。
在多项对比的人类并排评估中，HRAN 优于基线。
消融研究显示词级注意力和话语级注意力各自对性能提升有贡献；移除组件会降低结果。
注意力可视化显示 HRAN 关注于信息丰富的词语（例如“girl”、“boyfriend”、身高数字）以及上下文中的关键话语。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。