QUICK REVIEW

[Paper Review] Densely Connected Attention Propagation for Reading Comprehension

Yi Tay, Anh Tuan Luu|arXiv (Cornell University)|Nov 10, 2018

Topic Modeling47 citations

TL;DR

DecaProp densely connects all passage-question layers using Bidirectional Attention Connectors and achieves state-of-the-art results on four challenging RC datasets, outperforming strong baselines by notable margins.

ABSTRACT

We propose DecaProp (Densely Connected Attention Propagation), a new densely connected neural architecture for reading comprehension (RC). There are two distinct characteristics of our model. Firstly, our model densely connects all pairwise layers of the network, modeling relationships between passage and query across all hierarchical levels. Secondly, the dense connectors in our network are learned via attention instead of standard residual skip-connectors. To this end, we propose novel Bidirectional Attention Connectors (BAC) for efficiently forging connections throughout the network. We conduct extensive experiments on four challenging RC benchmarks. Our proposed approach achieves state-of-the-art results on all four, outperforming existing baselines by up to $2.6\%-14.2\%$ in absolute F1 score.

Motivation & Objective

Motivate deeper information flow in RC models beyond traditional encode-interact-predict pipelines.
Propose a densely connected architecture that links all passage and query layers across hierarchy.
Introduce Bidirectional Attention Connectors (BAC) to enable dense, efficient cross-layer connections via attention-based compression.
Demonstrate that dense, attention-based connectivity yields large empirical gains on multiple RC benchmarks.

Proposed method

Introduce BAC as a compact, learnable skip-connector based on compressed bi-attention outputs using a factorization-machine (FM) style G(.) to produce scalar connectors.
Construct DecaEnc with k layers where each layer passes P and Q through BiRNNs and densely connects P and Q across all layer pairs with BACs.
Use a DecaCore interaction module consisting of gated attention and gated self-attention on the densely propagated representations.
Concatenate all BAC outputs with the encoder outputs to form a rich, multi-hierarchical representation M for the answer pointer.
Employ a two-layer BiRNN-based answer pointer trained with cross-entropy on start/end indices (L(θ) = -log p1 - log p2).
Initialize with GloVe embeddings, fixed during training, and train end-to-end with standard RC optimization settings.

Experimental results

Research questions

RQ1Can explicitly dense, attention-based cross-layer connections improve information flow in RC models beyond fixed-depth interactions?
RQ2Do asynchronous, cross-hierarchical connections between passage and question representations yield measurable gains over synchronous, same-layer interactions?
RQ3How effective are compressed, attention-based connectors (BACs) at enabling many dense connections without prohibitive computational costs?
RQ4What is the empirical impact of densely connected attention propagation on diverse RC benchmarks?

Key findings

DecaProp achieves state-of-the-art results on four RC benchmarks: NewsQA, Quasar-T, SearchQA, and NarrativeQA.
On NewsQA, DecaProp improves AMANDA by +4.7 EM and +2.6 F1, and surpasses BiDAF by substantial margins (e.g., +16% EM, +14% F1).
On Quasar-T, DecaProp surpasses the Reinforced Ranker Reader (R3) by +4.4 EM and +6.0 F1, and exceeds BiDAF and GA by large margins (>15% F1).
On SearchQA, DecaProp outperforms AMANDA by +15.4 EM and +14.2 F1 in the original setting, and outperforms AQA and R3 in the overall setting by notable margins (+18.1 EM / +18 F1).
On NarrativeQA, DecaProp consistently outperforms baseline systems, with an average improvement of around 5% across metrics.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.