QUICK REVIEW

[论文解读] Neural Matching Models for Question Retrieval and Next Question Prediction in Conversation

Yang Liu, Hamed Zamani|arXiv (Cornell University)|Jul 17, 2017

Topic Modeling参考文献 27被引用 26

一句话总结

本文提出用于对话中问题检索和下一个问题预测的神经匹配模型，利用深度神经网络学习序列表征与匹配分数。结果表明，神经模型在问题检索任务中显著优于基线方法；而在对话上下文场景下，无LSTM的简化模型表现更优，这是由于长序列限制所致。

ABSTRACT

The recent boom of AI has seen the emergence of many human-computer conversation systems such as Google Assistant, Microsoft Cortana, Amazon Echo and Apple Siri. We introduce and formalize the task of predicting questions in conversations, where the goal is to predict the new question that the user will ask, given the past conversational context. This task can be modeled as a "sequence matching" problem, where two sequences are given and the aim is to learn a model that maps any pair of sequences to a matching probability. Neural matching models, which adopt deep neural networks to learn sequence representations and matching scores, have attracted immense research interests of information retrieval and natural language processing communities. In this paper, we first study neural matching models for the question retrieval task that has been widely explored in the literature, whereas the effectiveness of neural models for this task is relatively unstudied. We further evaluate the neural matching models in the next question prediction task in conversations. We have used the publicly available Quora data and Ubuntu chat logs in our experiments. Our evaluations investigate the potential of neural matching models with representation learning for question retrieval and next question prediction in conversations. Experimental results show that neural matching models perform well for both tasks.

研究动机与目标

评估神经匹配模型在问题检索任务中的有效性，该任务在社区问答系统与搜索系统中至关重要。
探究神经匹配模型在对话系统中用于下一个问题预测的适用性，这是序列匹配的一种新颖扩展。
将神经模型与传统词项匹配基线方法进行比较，并分析在长对话上下文中的性能权衡。
探究在对话场景下的序列匹配任务中，以表征为中心还是以交互为中心的神经架构能取得更优结果。

提出的方法

采用以表征为中心的神经匹配模型，独立地将查询和候选问题序列编码为稠密向量表征。
使用深度神经网络（包括双向LSTM和前馈层）学习问题序列的上下文表征。
应用匹配函数（如余弦相似度或学习到的交互层）计算上下文与候选问题之间的匹配分数。
使用Quora和Ubuntu聊天日志中的监督标签进行端到端训练，损失函数针对排序性能进行优化。
将神经模型与传统词项匹配基线方法（如BM25）结合，探索混合检索策略。
使用标准信息检索指标（如平均平均精度（MAP）和归一化折损累计增益（nDCG））评估模型性能。

实验结果

研究问题

RQ1与传统基于词项的方法相比，神经匹配模型在检索语义相似问题方面的有效性如何？
RQ2在给定对话历史的前提下，神经匹配模型能否泛化用于预测下一个问题？
RQ3在处理长对话历史时，不同神经架构（例如，有或无LSTM）的表现如何？
RQ4将神经匹配与传统检索方法结合，是否能提升问题检索和下一个问题预测任务的性能？
RQ5循环神经网络架构（如LSTM）在建模长对话上下文用于下一个问题预测时存在哪些局限性？

主要发现

在Quora数据集的问题检索任务中，神经匹配模型显著优于所有基线方法，MAP和nDCG得分更高。
将神经匹配与传统词项检索方法（如BM25）结合，性能提升幅度大于单独使用任一方法。
在Ubuntu聊天日志的下一个问题预测任务中，无LSTM层的模型优于含LSTM的模型，表明LSTM在长对话历史中表现不佳。
无循环组件的简化神经架构在下一个问题预测任务中更有效，可能由于结构更简单且在长上下文上泛化能力更强。
本研究证明，神经匹配模型在问题检索和对话下一个问题预测任务中均有效，且架构选择对长上下文性能至关重要。
弱监督方法（通过BM25生成合成数据）在标注数据有限时，被证明可用于训练神经模型。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。