QUICK REVIEW

[Paper Review] Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

Minjoon Seo, Jinhyuk Lee|arXiv (Cornell University)|Jun 13, 2019

Topic Modeling33 references17 citations

TL;DR

This paper introduces DenSPI, a real-time open-domain question answering system that uses query-agnostic dense-sparse phrase indexing to enable fast, scalable inference. By jointly encoding phrases with both dense and sparse vectors and indexing them offline, DenSPI achieves 58x faster end-to-end inference than DrQA on CPUs while maintaining state-of-the-art accuracy on SQuAD-Open, with 6,000x reduced computational cost and 6.4% higher exact match score.

ABSTRACT

Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query. In this paper, we introduce the query-agnostic indexable representation of document phrases that can drastically speed up open-domain QA and also allows us to reach long-tail targets. In particular, our dense-sparse phrase encoding effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents. Leveraging optimization strategies, our model can be trained in a single 4-GPU server and serve entire Wikipedia (up to 60 billion phrases) under 2TB with CPUs only. Our experiments on SQuAD-Open show that our model is more accurate than DrQA (Chen et al., 2017) with 6000x reduced computational cost, which translates into at least 58x faster end-to-end inference benchmark on CPUs.

Motivation & Objective

To address the high inference latency of existing open-domain QA systems that reprocess documents for every query.
To enable real-time, scalable question answering by pre-indexing document phrases independently of queries.
To improve retrieval diversity and accuracy in open-domain QA by combining dense semantic and sparse lexical representations.
To reduce computational cost and memory usage for training and serving large-scale phrase indexes on standard hardware.
To achieve high performance on open-domain benchmarks like SQuAD-Open with minimal latency.

Proposed method

Proposes a dense-sparse phrase encoding that combines contextualized dense vectors (e.g., BERT-based) with sparse term-frequency vectors to capture semantic, syntactic, and lexical information.
Encodes document phrases as fixed representations using start and end token positions, enabling offline indexing and fast retrieval.
Uses inner product search in the shared embedding space to retrieve the most relevant phrase for a given question at inference time.
Employs approximate nearest neighbor search on the indexed phrase representations for scalable, real-time inference on web-scale data.
Applies optimization strategies—such as mixed-precision training and efficient data loading—to train and deploy the model on a single 4-GPU server with 64GB RAM and 2TB SSD.
Introduces a hybrid search strategy (SFS + DFS) that combines sparse and dense vector retrieval to improve coverage and accuracy.

Experimental results

Research questions

RQ1Can a query-agnostic phrase indexing approach significantly reduce inference latency in open-domain question answering?
RQ2How does combining dense and sparse phrase representations improve retrieval accuracy and diversity compared to pipeline methods?
RQ3To what extent can a dense-sparse phrase index be trained and served efficiently on standard hardware without multi-GPU or high-end infrastructure?
RQ4What is the trade-off between accuracy and speed when using approximate nearest neighbor search on hybrid dense-sparse representations?
RQ5How does the model perform on long-tail and out-of-distribution questions compared to strong baselines like DrQA?

Key findings

DenSPI achieves 58x faster end-to-end inference than DrQA on CPUs, including disk access time, due to pre-indexed phrase representations.
The model reduces computational cost by 6,000x compared to DrQA under controlled conditions while maintaining or improving accuracy.
DenSPI -Hybrid achieves 6.4% higher exact match (EM) than DrQA on SQuAD-Open, with 6.6% higher F1 in the best configuration.
The model retrieves answers from an average of 817 unique documents per query, compared to only 5 for DrQA, indicating significantly greater retrieval diversity.
Removing the sparse vector leads to a 19.6% drop in F1, demonstrating its critical role in distinguishing lexically distinct but semantically similar phrases.
Qualitative analysis shows DenSPI successfully retrieves correct answers from multiple documents even when lexical overlap is low, outperforming DrQA in challenging open-domain cases.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.