[Paper Review] Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
This paper introduces DenSPI, a real-time open-domain question answering system that uses query-agnostic dense-sparse phrase indexing to enable fast, scalable inference. By jointly encoding phrases with both dense and sparse vectors and indexing them offline, DenSPI achieves 58x faster end-to-end inference than DrQA on CPUs while maintaining state-of-the-art accuracy on SQuAD-Open, with 6,000x reduced computational cost and 6.4% higher exact match score.
Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query. In this paper, we introduce the query-agnostic indexable representation of document phrases that can drastically speed up open-domain QA and also allows us to reach long-tail targets. In particular, our dense-sparse phrase encoding effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents. Leveraging optimization strategies, our model can be trained in a single 4-GPU server and serve entire Wikipedia (up to 60 billion phrases) under 2TB with CPUs only. Our experiments on SQuAD-Open show that our model is more accurate than DrQA (Chen et al., 2017) with 6000x reduced computational cost, which translates into at least 58x faster end-to-end inference benchmark on CPUs.
Motivation & Objective
- To address the high inference latency of existing open-domain QA systems that reprocess documents for every query.
- To enable real-time, scalable question answering by pre-indexing document phrases independently of queries.
- To improve retrieval diversity and accuracy in open-domain QA by combining dense semantic and sparse lexical representations.
- To reduce computational cost and memory usage for training and serving large-scale phrase indexes on standard hardware.
- To achieve high performance on open-domain benchmarks like SQuAD-Open with minimal latency.
Proposed method
- Proposes a dense-sparse phrase encoding that combines contextualized dense vectors (e.g., BERT-based) with sparse term-frequency vectors to capture semantic, syntactic, and lexical information.
- Encodes document phrases as fixed representations using start and end token positions, enabling offline indexing and fast retrieval.
- Uses inner product search in the shared embedding space to retrieve the most relevant phrase for a given question at inference time.
- Employs approximate nearest neighbor search on the indexed phrase representations for scalable, real-time inference on web-scale data.
- Applies optimization strategies—such as mixed-precision training and efficient data loading—to train and deploy the model on a single 4-GPU server with 64GB RAM and 2TB SSD.
- Introduces a hybrid search strategy (SFS + DFS) that combines sparse and dense vector retrieval to improve coverage and accuracy.
Experimental results
Research questions
- RQ1Can a query-agnostic phrase indexing approach significantly reduce inference latency in open-domain question answering?
- RQ2How does combining dense and sparse phrase representations improve retrieval accuracy and diversity compared to pipeline methods?
- RQ3To what extent can a dense-sparse phrase index be trained and served efficiently on standard hardware without multi-GPU or high-end infrastructure?
- RQ4What is the trade-off between accuracy and speed when using approximate nearest neighbor search on hybrid dense-sparse representations?
- RQ5How does the model perform on long-tail and out-of-distribution questions compared to strong baselines like DrQA?
Key findings
- DenSPI achieves 58x faster end-to-end inference than DrQA on CPUs, including disk access time, due to pre-indexed phrase representations.
- The model reduces computational cost by 6,000x compared to DrQA under controlled conditions while maintaining or improving accuracy.
- DenSPI -Hybrid achieves 6.4% higher exact match (EM) than DrQA on SQuAD-Open, with 6.6% higher F1 in the best configuration.
- The model retrieves answers from an average of 817 unique documents per query, compared to only 5 for DrQA, indicating significantly greater retrieval diversity.
- Removing the sparse vector leads to a 19.6% drop in F1, demonstrating its critical role in distinguishing lexically distinct but semantically similar phrases.
- Qualitative analysis shows DenSPI successfully retrieves correct answers from multiple documents even when lexical overlap is low, outperforming DrQA in challenging open-domain cases.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.