Skip to main content
QUICK REVIEW

[Paper Review] Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

Minjoon Seo, Jinhyuk Lee|arXiv (Cornell University)|Jun 13, 2019
Topic Modeling33 references17 citations
TL;DR

This paper introduces DenSPI, a real-time open-domain question answering system that uses query-agnostic dense-sparse phrase indexing to enable fast, scalable inference. By jointly encoding phrases with both dense and sparse vectors and indexing them offline, DenSPI achieves 58x faster end-to-end inference than DrQA on CPUs while maintaining state-of-the-art accuracy on SQuAD-Open, with 6,000x reduced computational cost and 6.4% higher exact match score.

ABSTRACT

Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query. In this paper, we introduce the query-agnostic indexable representation of document phrases that can drastically speed up open-domain QA and also allows us to reach long-tail targets. In particular, our dense-sparse phrase encoding effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents. Leveraging optimization strategies, our model can be trained in a single 4-GPU server and serve entire Wikipedia (up to 60 billion phrases) under 2TB with CPUs only. Our experiments on SQuAD-Open show that our model is more accurate than DrQA (Chen et al., 2017) with 6000x reduced computational cost, which translates into at least 58x faster end-to-end inference benchmark on CPUs.

Motivation & Objective

  • To address the high inference latency of existing open-domain QA systems that reprocess documents for every query.
  • To enable real-time, scalable question answering by pre-indexing document phrases independently of queries.
  • To improve retrieval diversity and accuracy in open-domain QA by combining dense semantic and sparse lexical representations.
  • To reduce computational cost and memory usage for training and serving large-scale phrase indexes on standard hardware.
  • To achieve high performance on open-domain benchmarks like SQuAD-Open with minimal latency.

Proposed method

  • Proposes a dense-sparse phrase encoding that combines contextualized dense vectors (e.g., BERT-based) with sparse term-frequency vectors to capture semantic, syntactic, and lexical information.
  • Encodes document phrases as fixed representations using start and end token positions, enabling offline indexing and fast retrieval.
  • Uses inner product search in the shared embedding space to retrieve the most relevant phrase for a given question at inference time.
  • Employs approximate nearest neighbor search on the indexed phrase representations for scalable, real-time inference on web-scale data.
  • Applies optimization strategies—such as mixed-precision training and efficient data loading—to train and deploy the model on a single 4-GPU server with 64GB RAM and 2TB SSD.
  • Introduces a hybrid search strategy (SFS + DFS) that combines sparse and dense vector retrieval to improve coverage and accuracy.

Experimental results

Research questions

  • RQ1Can a query-agnostic phrase indexing approach significantly reduce inference latency in open-domain question answering?
  • RQ2How does combining dense and sparse phrase representations improve retrieval accuracy and diversity compared to pipeline methods?
  • RQ3To what extent can a dense-sparse phrase index be trained and served efficiently on standard hardware without multi-GPU or high-end infrastructure?
  • RQ4What is the trade-off between accuracy and speed when using approximate nearest neighbor search on hybrid dense-sparse representations?
  • RQ5How does the model perform on long-tail and out-of-distribution questions compared to strong baselines like DrQA?

Key findings

  • DenSPI achieves 58x faster end-to-end inference than DrQA on CPUs, including disk access time, due to pre-indexed phrase representations.
  • The model reduces computational cost by 6,000x compared to DrQA under controlled conditions while maintaining or improving accuracy.
  • DenSPI -Hybrid achieves 6.4% higher exact match (EM) than DrQA on SQuAD-Open, with 6.6% higher F1 in the best configuration.
  • The model retrieves answers from an average of 817 unique documents per query, compared to only 5 for DrQA, indicating significantly greater retrieval diversity.
  • Removing the sparse vector leads to a 19.6% drop in F1, demonstrating its critical role in distinguishing lexically distinct but semantically similar phrases.
  • Qualitative analysis shows DenSPI successfully retrieves correct answers from multiple documents even when lexical overlap is low, outperforming DrQA in challenging open-domain cases.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.