QUICK REVIEW

[論文レビュー] Attentive Pooling Networks

Cícero Nogueira dos Santos, Ming Tan|arXiv (Cornell University)|Feb 11, 2016

Topic Modeling参考文献 25被引用数 324

ひとこと要約

Attentive Pooling (AP) は二方向のアテンション機構を備えたプーリング層によりペアを認識するようになり、三つのデータセットにわたる回答選択で CNN と biLSTM を改善し、手作り特徴量なしで最先端の成果を達成します。

ABSTRACT

In this work, we propose Attentive Pooling (AP), a two-way attention mechanism for discriminative model training. In the context of pair-wise ranking or classification with neural networks, AP enables the pooling layer to be aware of the current input pair, in a way that information from the two input items can directly influence the computation of each other's representations. Along with such representations of the paired inputs, AP jointly learns a similarity measure over projected segments (e.g. trigrams) of the pair, and subsequently, derives the corresponding attention vector for each input to guide the pooling. Our two-way attention mechanism is a general framework independent of the underlying representation learning, and it has been applied to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in our studies. The empirical results, from three very different benchmark tasks of question answering/answer selection, demonstrate that our proposed models outperform a variety of strong baselines and achieve state-of-the-art performance in all the benchmarks.

研究の動機と目的

一方向の注意を超えたニューラルネットワークにおける識別的なペア間マッチングの必要性を動機づける。
表現とペア間類似性を同時に学習する Attentive Pooling (AP) を提案する。
AP を回答選択のための一般的な機構として CNNs および RNNs に適用可能であることを示す。
AP が長い入力に対する頑健性を向上させ、多数の畳み込みフィルタの必要性を低減することを示す。

提案手法

射影されたセグメント（例: トライグラムや隠れ状態）間で学習された類似性がプーリングを導く二方向の注意を定義する。
Q^T U A によって G = tanh(Q^T U A) の間項相互作用行列を計算する。ここで Q と A はペア間表現（CNN または biLSTM 由来）である。
列方向/行方向のプーリングとソフトマックスを用いて、両方の入力に対するアテンションベクターを導出し、r^q と r^a を得る。
r^q と r^a のコサイン類似度でペアをスコアリングし、ヒンジ・ランキング損失で学習する。
AP を AP-CNN および AP-biLSTM アーキテクチャの両方に適用し、QA-CNN および QA-biLSTM と比較する。
SGD で学習し、ネガティブサンプリングを用いる（1質問あたり50、更新のための最大スコアのネガティブ）。

実験結果

リサーチクエスチョン

RQ1二方向アテンションを用いたアテンションプーリングは、一方向の注意や無注意と比べてペアワイズQAタスクの識別性の高い学習を向上させるか？
RQ2AP は回答選択のために CNN と RNN（biLSTM の）と効果的に統合できるか？
RQ3AP はより長い入力に対する頑健性を高め、モデルの複雑さ（フィルタ数の削減）を低減しつつ精度を維持または向上させるか？
RQ4異なる長さとドメインを持つ多様なデータセット（InsuranceQA、TREC-QA、WikiQA）で AP はどのように機能するか？

主な発見

Dataset	Model	Dev	Test1	Test2
InsuranceQA	AP-CNN	68.8	69.8	66.3
InsuranceQA	QA-CNN	61.6	60.2	56.1
InsuranceQA	AP-biLSTM	68.4	71.7	66.4
InsuranceQA	QA-biLSTM	66.6	66.6	63.7
TREC-QA	AP-CNN	0.7530	0.8511	-
TREC-QA	QA-CNN	0.7147	0.8070	-
TREC-QA	AP-biLSTM	0.7132	0.8032	-
TREC-QA	QA-biLSTM	0.6750	0.7723	-
WikiQA	AP-CNN	0.6886	0.6957	-
WikiQA	QA-CNN	0.6701	0.6822	-
WikiQA	AP-biLSTM	0.6705	0.6842	-
WikiQA	QA-biLSTM	0.6557	0.6695	-

AP-CNN および AP-biLSTM は三つのデータセットすべてで非注意型の counterparts を上回る。
AP-CNN は InsuranceQA、TREC-QA で最先端の結果を達成し、WikiQA で堅調な結果を示す。
AP ベースのモデルは畳み込みフィルタを少なく済ませられ、学習時間も速くなる可能性がある（例: 400フィルタの AP-CNN と 4000 の QA-CNN）。
長い回答に対する頑健性を向上させ、約90トークンを超えると AP-CNN の精度が安定するのに対し QA-CNN ではそうならない。
データセット全体で、AP-CNN はベースラインより MAP/Precision 指標を一貫して改善し、しばしば近年の最先端手法を上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。