Skip to main content
QUICK REVIEW

[Paper Review] Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading

Martin Raison, Pierre-Emmanuel Mazaré|arXiv (Cornell University)|Apr 27, 2018
Topic Modeling34 references18 citations
TL;DR

Weaver proposes a deep co-encoding model for machine reading that uses stacked, woven bidirectional LSTMs to jointly encode questions and documents without relying on attention mechanisms. It achieves state-of-the-art performance on SQuAD (42.3 EM with 25 retrieved documents), solves 17/18 bAbI tasks, and significantly outperforms prior methods in open-domain question answering by jointly learning contextual and question representations through end-to-end training.

ABSTRACT

This paper aims at improving how machines can answer questions directly from text, with the focus of having models that can answer correctly multiple types of questions and from various types of texts, documents or even from large collections of them. To that end, we introduce the Weaver model that uses a new way to relate a question to a textual context by weaving layers of recurrent networks, with the goal of making as few assumptions as possible as to how the information from both question and context should be combined to form the answer. We show empirically on six datasets that Weaver performs well in multiple conditions. For instance, it produces solid results on the very popular SQuAD dataset (Rajpurkar et al., 2016), solves almost all bAbI tasks (Weston et al., 2015) and greatly outperforms state-of-the-art methods for open domain question answering from text (Chen et al., 2017).

Motivation & Objective

  • To develop a more robust and general-purpose machine reading model capable of handling diverse question types and long-context documents.
  • To reduce reliance on attention mechanisms by co-encoding questions and contexts through a novel recurrent architecture.
  • To improve performance in open-domain question answering where retrieval is imperfect and context spans are long or fragmented.
  • To enable the model to generate answers not present in the context, such as out-of-vocabulary words.
  • To enhance end-to-end performance in pipeline systems by improving the reader component's accuracy across multiple documents.

Proposed method

  • Weaver uses a stacked, woven architecture of bidirectional LSTMs to co-encode questions and documents simultaneously, learning deep interconnections between their representations.
  • The model replaces attention mechanisms with a hierarchical, co-encoding structure that allows joint learning of question and context representations.
  • An answering layer inspired by Memory Networks performs hop-based reasoning over the co-encoded representations to predict answer spans.
  • The model is trained end-to-end on span-based question answering, with a loss function optimized for exact match and F1 scores.
  • Ablation studies show that the RNN-based co-encoding is the primary driver of performance, not auxiliary components like convolution or memory networks.
  • The model is fine-tuned on downstream datasets such as CuratedTREC, WebQuestions, and WikiMovies to adapt to new domains.

Experimental results

Research questions

  • RQ1Can a co-encoding model based solely on recurrent networks outperform attention-based models in machine reading?
  • RQ2How does the performance of a co-encoding model scale with increasing numbers of retrieved documents in open-domain question answering?
  • RQ3Can a reader model trained on SQuAD generalize to and outperform baselines on diverse datasets like bAbI, WikiHop, and CuratedTREC?
  • RQ4To what extent does removing attention mechanisms affect model performance, and can co-encoding compensate?
  • RQ5Can the model generate answers that are not span-exact matches in the context, such as words not present in the document?

Key findings

  • Weaver achieves 42.3 EM on the SQuAD dataset when using 25 retrieved Wikipedia articles, a 12+ point improvement over the previous best reported performance.
  • The model solves 17 out of 18 bAbI tasks, demonstrating strong generalization across diverse reasoning skills.
  • On the WikiHop dataset, Weaver achieves state-of-the-art results, showing robustness to multi-hop reasoning and short context fragments.
  • In ablation studies, removing the woven RNN layers reduces F1 to 33.0, confirming that the co-encoding mechanism is the primary source of performance gains.
  • Fine-tuning on CuratedTREC yields a 6.6 EM improvement over the previous state of the art, reaching 43.8 EM with a fine-tuned model.
  • The model maintains strong performance even when the number of retrieved documents increases to 25, unlike DrQA, which plateaus at 10 documents.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.