QUICK REVIEW

[论文解读] Attentive Pooling Networks

Cícero Nogueira dos Santos, Ming Tan|arXiv (Cornell University)|Feb 11, 2016

Topic Modeling参考文献 25被引用 324

一句话总结

Attentive Pooling（AP）引入了一种双向注意力机制，使池化层具备成对感知能力，从而在跨三个数据集的答案选择任务中提升 CNNs 和 biLSTMs，并在不使用手工特征的情况下达到最新的效果。

ABSTRACT

In this work, we propose Attentive Pooling (AP), a two-way attention mechanism for discriminative model training. In the context of pair-wise ranking or classification with neural networks, AP enables the pooling layer to be aware of the current input pair, in a way that information from the two input items can directly influence the computation of each other's representations. Along with such representations of the paired inputs, AP jointly learns a similarity measure over projected segments (e.g. trigrams) of the pair, and subsequently, derives the corresponding attention vector for each input to guide the pooling. Our two-way attention mechanism is a general framework independent of the underlying representation learning, and it has been applied to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in our studies. The empirical results, from three very different benchmark tasks of question answering/answer selection, demonstrate that our proposed models outperform a variety of strong baselines and achieve state-of-the-art performance in all the benchmarks.

研究动机与目标

动机是展示在神经网络中需要比单向注意力更具辨别性的成对匹配。
提出 Attentive Pooling (AP)，以联合学习表示与成对相似性。
展示 AP 作为可泛化的机制，适用于用于答案选择的 CNN 与 RNN。
显示 AP 提高对较长输入的鲁棒性，并减少对大量卷积核的需求。

提出的方法

定义一种双向注意力，其中对投影片段（如三元组或隐藏状态）的学习相似性来引导池化。
通过 G = tanh(Q^T U A) 计算项间交互矩阵 G，其中 Q 与 A 是成对表示（来自 CNN 或 biLSTM）。
通过列向池化/行向池化和 softmax 推导出两个输入的注意力向量，得到 r^q 和 r^a。
通过 r^q 与 r^a 的余弦相似度对成对进行打分，并使用 hinge 排序损失进行训练。
将 AP 应用于 AP-CNN 和 AP-biLSTM 架构，并与 QA-CNN、QA-biLSTM 进行比较。
使用 SGD 训练并采用负采样（每个问题 50 个，选择分数最高的负样本用于更新）。

实验结果

研究问题

RQ1与单向注意力或未注意力相比，双向注意力池化是否能提升成对问答任务的判别性训练？
RQ2AP 是否能在答案选择任务中有效整合到 CNN 和 RNN（biLSTM）？
RQ3AP 是否在保持或提高精度的同时增强对较长输入的鲁棒性并降低模型复杂度（更少的过滤器）？
RQ4AP 在不同长度和领域的数据集（InsuranceQA、TREC-QA、WikiQA）上的表现如何？

主要发现

数据集	模型	开发集	测试1	测试2
InsuranceQA	AP-CNN	68.8	69.8	66.3
InsuranceQA	QA-CNN	61.6	60.2	56.1
InsuranceQA	AP-biLSTM	68.4	71.7	66.4
InsuranceQA	QA-biLSTM	66.6	66.6	63.7
TREC-QA	AP-CNN	0.7530	0.8511	-
TREC-QA	QA-CNN	0.7147	0.8070	-
TREC-QA	AP-biLSTM	0.7132	0.8032	-
TREC-QA	QA-biLSTM	0.6750	0.7723	-
WikiQA	AP-CNN	0.6886	0.6957	-
WikiQA	QA-CNN	0.6701	0.6822	-
WikiQA	AP-biLSTM	0.6705	0.6842	-
WikiQA	QA-biLSTM	0.6557	0.6695	-

AP-CNN 和 AP-biLSTM 在所有三个数据集上都优于它们的非注意力对比模型。
AP-CNN 在 InsuranceQA、TREC-QA 上达到最先进的结果，在 WikiQA 上也有出色表现。
基于 AP 的模型需要更少的卷积过滤器，训练时间也可更快（例如 AP-CNN 400 个过滤器 vs QA-CNN 4000）。
AP 提高对较长答案的鲁棒性，AP-CNN 的准确率在约 90 个标记以上趋于稳定，与 QA-CNN 不同。
在各数据集上，AP-CNN 持续提升与基线相比的 MAP/Precision 指标，常常超越最近的最先进方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。