[Paper Review] Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue
Introduces Sequential Knowledge Transformer (SKT), a sequential latent-variable model for knowledge selection in multi-turn knowledge-grounded dialogue, achieving state-of-the-art on Wizard of Wikipedia and Holl-E.
Knowledge-grounded dialogue is a task of generating an informative response based on both discourse context and external knowledge. As we focus on better modeling the knowledge selection in the multi-turn knowledge-grounded dialogue, we propose a sequential latent variable model as the first approach to this matter. The model named sequential knowledge transformer (SKT) can keep track of the prior and posterior distribution over knowledge; as a result, it can not only reduce the ambiguity caused from the diversity in knowledge selection of conversation but also better leverage the response information for proper choice of knowledge. Our experimental results show that the proposed model improves the knowledge selection accuracy and subsequently the performance of utterance generation. We achieve the new state-of-the-art performance on Wizard of Wikipedia (Dinan et al., 2019) as one of the most large-scale and challenging benchmarks. We further validate the effectiveness of our model over existing conversation methods in another knowledge-based dialogue Holl-E dataset (Moghe et al., 2018).
Motivation & Objective
- Motivate better knowledge selection in multi-turn knowledge-grounded dialogue.
- Develop a sequential latent variable model to track prior and posterior knowledge across turns.
- Enable joint inference of knowledge selection and response generation.
- Leverage response information to improve knowledge selection accuracy.
- Demonstrate improved knowledge selection and response quality on large benchmarks.
Proposed method
- Propose Sequential Knowledge Transformer (SKT) with a sequential latent variable framework.
- Model knowledge selection as a sequential decision process with latent variables to capture diversity.
- Use a variational lower bound to jointly model knowledge selection and response generation (Eq. 2–3).
- Employ prior and posterior distributions over knowledge with GRU-based history to compute pi_theta and q_phi (Eqs. 5–8).
- Decode responses with a copy mechanism (Transformer decoder) conditioned on selected knowledge (Eq. 9–11).
- Train with an auxiliary knowledge loss to exploit ground-truth knowledge signals (Eq. 12).
Experimental results
Research questions
- RQ1How can sequential latent variables improve knowledge selection in multi-turn dialogues?
- RQ2Does joint modeling of knowledge selection and response generation enhance dialogue quality and grounding?
- RQ3Can the model achieve state-of-the-art performance on large knowledge-grounded dialogue benchmarks?
- RQ4How does the approach generalize to different datasets beyond Wizard of Wikipedia?
Key findings
- Achieves state-of-the-art knowledge selection accuracy and utterance generation on Wizard of Wikipedia.
- Outperforms baselines on both Test Seen and Test Unseen, with larger gains on unseen topics.
- Demonstrates strong performance on Holl-E with single and multiple reference settings.
- Human evaluations favor SKT over baselines in engagingness and knowledgeability, especially in unseen topics.
- The sequential latent approach better captures topic shifts and knowledge grounding across turns.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.