[Paper Review] SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
SDNet introduces contextualized inter-attention and self-attention over passage and dialogue history, leveraging BERT via a weighted layer combination with locked parameters, to advance conversational QA and achieve state-of-the-art on CoQA.
Conversational question answering (CQA) is a novel QA task that requires understanding of dialogue context. Different from traditional single-turn machine reading comprehension (MRC) tasks, CQA includes passage comprehension, coreference resolution, and contextual understanding. In this paper, we propose an innovated contextualized attention-based deep neural network, SDNet, to fuse context into traditional MRC models. Our model leverages both inter-attention and self-attention to comprehend conversation context and extract relevant information from passage. Furthermore, we demonstrated a novel method to integrate the latest BERT contextual model. Empirical results show the effectiveness of our model, which sets the new state of the art result in CoQA leaderboard, outperforming the previous best model by 1.6% F1. Our ensemble model further improves the result by 2.7% F1.
Motivation & Objective
- Address the challenge of conversational question answering by incorporating dialogue history and passage understanding.
- Develop a neural architecture that fuses context through inter-attention and self-attention.
- Leverage BERT contextual embeddings in a novel, fixed-parameter fashion to boost MRC-based QA.
Proposed method
- Prepend previous Q/A rounds to the current question to form a contextualized question for MRC framing.
- Use inter-attention from question to passage and self-attention among words to capture relations across context and query.
- Integrate BERT by taking a weighted sum of its layer outputs with locked parameters (no gradient updates).
- Apply history-of-word based multi-level attention to fuse multiple BERT/RNN representations efficiently.
- Generate answer spans via start/end probabilities with a GRU fusion step, and handle yes/no/unknown outputs for CoQA.
- Train end-to-end by maximizing the likelihood of ground-truth spans or yes/no/unknown labels.
Experimental results
Research questions
- RQ1How can dialogue history be effectively integrated into passage-based QA to answer multi-turn questions?
- RQ2What is the impact of combining inter-attention, self-attention, and contextual embeddings on QA performance?
- RQ3Does locking BERT parameters and using a weighted layer combination improve downstream QA tasks?
- RQ4How does SDNet perform on CoQA compared to prior state-of-the-art models and baselines?
Key findings
- SDNet achieves an overall F1 of 76.6% (single model) on CoQA, outperforming the previous state-of-the-art by 1.6%.
- An ensemble SDNet yields an overall F1 of 79.3%, further surpassing prior results.
- SDNet is the first model to pass 80% F1 on CoQA in-domain data (80.7%).
- Ablation shows removing BERT reduces F1 by 7.15%, and the per-layer weighted sum of BERT outputs boosts F1 by 1.75% over using only the last layer.
- Prepending 2 previous QA rounds to the current question yields peak performance among tested history lengths.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.