[Paper Review] Natural Language Inference over Interaction Space
The paper proposes Interactive Inference Network (IIN) and its dense instantiation DIIN, modeling cross-sentence interaction as an interaction tensor; DIIN achieves state-of-the-art NLI performance on SNLI and MultiNLI and strong results on Quora paraphrase detection.
Natural Language Inference (NLI) task requires an agent to determine the logical relationship between a natural language premise and a natural language hypothesis. We introduce Interactive Inference Network (IIN), a novel class of neural network architectures that is able to achieve high-level understanding of the sentence pair by hierarchically extracting semantic features from interaction space. We show that an interaction tensor (attention weight) contains semantic information to solve natural language inference, and a denser interaction tensor contains richer semantic information. One instance of such architecture, Densely Interactive Inference Network (DIIN), demonstrates the state-of-the-art performance on large scale NLI copora and large-scale NLI alike corpus. It's noteworthy that DIIN achieve a greater than 20% error reduction on the challenging Multi-Genre NLI (MultiNLI) dataset with respect to the strongest published system.
Motivation & Objective
- Motivate the use of interaction space for NLI to capture high-order cross-sentence semantics.
- Propose the Interactive Inference Network (IIN) framework for hierarchical feature extraction from interaction space.
- Instantiate a densely interactive variant (DIIN) that leverages convolutional feature extractors over interaction tensors.
- Demonstrate state-of-the-art results on SNLI and MultiNLI datasets and competitive performance on a paraphrase task.
- Provide ablation analyses to identify the contribution of individual components of DIIN.
Proposed method
- Construct an interaction tensor I by word-by-word interactions between premise and hypothesis representations.
- Use an encoding layer with highway networks and self-attention to produce refined premise P^enc and hypothesis H^enc representations.
- Compute an interaction tensor I_ij = beta(P̃_i, H̃_j) with a chosen beta (e.g., element-wise product).
- Apply a DenseNet-based feature extractor over the interaction tensor to learn high-level semantic features.
- Decode the resulting features with a linear classifier to predict entailment/neutral/contradiction.
- In DIIN, augment word representations with word embeddings, character features, and syntactic/exact-match features; train with Adadelta/SGD schedules; employ dropout and L2 regularization; use 1x1 convolutions to downscale I before DenseNet processing.
Experimental results
Research questions
- RQ1Can modeling cross-sentence interactions via an interaction tensor improve NLI performance beyond sentence-encoding approaches?
- RQ2Does a densely connected convolutional feature extractor over the interaction space capture richer semantic features for NLI?
- RQ3What is the contribution of exact-match and character/syntactic features to NLI performance in the interaction-space framework?
- RQ4How does DIIN perform on SNLI, MultiNLI, and Quora paraphrase tasks compared to prior state-of-the-art models?
- RQ5What insights can be drawn from ablation studies about the role of self-attention, fuse gates, and dense interaction tensors?
Key findings
| Model | Matched | Mismatched |
|---|---|---|
| BiLSTM (Williams et al., 2017) | 67.0 | 67.6 |
| InnerAtt (Balazs et al., 2017) | 72.1 | 72.1 |
| ESIM (Williams et al., 2017) | 72.3 | 72.1 |
| Gated-Att BiLSTM (Chen et al., 2017b) | 73.2 | 73.6 |
| Shorcut-Stacked encoder (Nie & Bansal, 2017) | 74.6 | 73.6 |
| DIIN | 78.8 | 77.8 |
| InnerAtt (ensemble) | 72.2 | 72.8 |
| Gated-Att BiLSTM (ensemble) | 74.9 | 74.9 |
| DIIN (ensemble) | 80.0 | 78.7 |
- DIIN achieves state-of-the-art performance on MultiNLI (matched: 78.8, mismatched: 77.8) and on SNLI (ensemble: 88.9) in the reported results.
- On MultiNLI, DIIN outperforms prior methods with a single model (78.8/77.8) and ensemble (80.0/78.7) results.
- On SNLI, DIIN reaches 88.0 (single) and 88.9 (ensemble) accuracy.
- On Quora paraphrase detection, DIIN achieves 89.06 test accuracy (single) and 89.84 (ensemble).
- Ablation shows exact-match features, convolutional structure, encoding layer, self-attention, and fuse gate all contribute to performance; removing components degrades results.
- Visualization suggests the interaction tensor captures diverse semantic patterns across channels, supporting the claim that interaction space contains rich semantic information.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.