[论文解读] Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue
简要结论:提出 Sequential Knowledge Transformer (SKT),一种用于多轮知识驱动对话中知识选择的序列潜在变量模型,在 Wizard of Wikipedia 和 Holl-E 上达到现有最佳性能。
Knowledge-grounded dialogue is a task of generating an informative response based on both discourse context and external knowledge. As we focus on better modeling the knowledge selection in the multi-turn knowledge-grounded dialogue, we propose a sequential latent variable model as the first approach to this matter. The model named sequential knowledge transformer (SKT) can keep track of the prior and posterior distribution over knowledge; as a result, it can not only reduce the ambiguity caused from the diversity in knowledge selection of conversation but also better leverage the response information for proper choice of knowledge. Our experimental results show that the proposed model improves the knowledge selection accuracy and subsequently the performance of utterance generation. We achieve the new state-of-the-art performance on Wizard of Wikipedia (Dinan et al., 2019) as one of the most large-scale and challenging benchmarks. We further validate the effectiveness of our model over existing conversation methods in another knowledge-based dialogue Holl-E dataset (Moghe et al., 2018).
研究动机与目标
- 促使在多轮知识驱动对话中实现更好的知识选择。
- 开发一个序列潜在变量模型,以在对话轮次之间跟踪先前知识与后验知识。
- 实现知识选择与应答生成的联合推断。
- 利用应答信息来提高知识选择的准确性。
- 在大型基准上展示改进的知识选择和应答质量。
提出的方法
- 提出 Sequential Knowledge Transformer (SKT),采用序列潜在变量框架。
- 将知识选择建模为具有潜在变量的序列决策过程,以捕捉多样性。
- 使用变分下界来联合建模知识选择与应答生成(Eq. 2–3)。
- 利用基于 GRU 的历史来计算 pi_theta 和 q_phi,使用知识的先验和后验分布(Eqs. 5–8)。
- 使用基于选定知识的拷贝机制进行解码应答(Transformer 解码器)解码应答(Eq. 9–11)。
- 通过辅助知识损失进行训练,以利用真值知识信号(Eq. 12)。
实验结果
研究问题
- RQ1序列潜在变量如何提升多轮对话中的知识选择?
- RQ2知识选择与应答生成的联合建模是否提高对话质量与知识的落地性?
- RQ3该模型是否能在大型知识驱动对话基准上达到最先进的性能?
- RQ4该方法如何扩展到 Wizard of Wikipedia 以外的不同数据集?
主要发现
| Method | PPL (Test Seen) | R-1 (Test Seen) | R-2 (Test Seen) | Acc (Test Seen) | PPL (Test Unseen) | R-1 (Test Unseen) | R-2 (Test Unseen) | Acc (Test Unseen) |
|---|---|---|---|---|---|---|---|---|
| Random knowledge selection | - | 8.4 | 1.4 | 2.7 | - | 8.0 | 1.2 | 2.3 |
| Repeat last utterance | - | 14.5 | 3.1 | - | - | 14.1 | 2.9 | - |
| Transformer (no knowledge)† (Dinan et al., 2019) | 41.8 | 17.8 | - | - | 87.0 | 14.0 | - | - |
| E2E Transformer MemNet† (Dinan et al., 2019) | 63.5 | 16.9 | - | 22.5 | 97.3 | 14.4 | - | 12.2 |
| E2E Transformer MemNet (BERT vocab)‡ | 53.2 | 17.7 | 4.8 | 23.2 | 137.8 | 13.6 | 1.9 | 10.5 |
| PostKS ∗ (Lian et al., 2019 ) | 79.1 | 13.0 | 1.0 | 4.8 | 193.8 | 13.1 | 1.0 | 4.2 |
| E2E BERT | 53.5 | 16.8 | 4.5 | 23.7 | 105.7 | 13.5 | 2.2 | 13.6 |
| PostKS + Knowledge Loss | 54.5 | 18.1 | 5.3 | 23.4 | 144.8 | 13.5 | 2.0 | 9.4 |
| E2E BERT + PostKS | 54.6 | 17.8 | 5.3 | 25.5 | 113.2 | 13.4 | 2.3 | 14.1 |
| E2E BERT + PostKS + Copy | 52.2 | 19.0 | 6.5 | 25.5 | 83.4 | 15.6 | 3.9 | 14.4 |
| Ours | 52.0 | 19.3 | 6.8 | 26.8 | 81.4 | 16.1 | 4.2 | 18.3 |
- 在 Wizard of Wikipedia 上实现最先进的知识选择准确性和话语生成。
- 在 Test Seen 和 Test Unseen 上均优于基线,在未见主题上的提升更大。
- 在 Holl-E 上在单一与多参考设置下均表现出色。
- 在参与度与知识性方面,人工评估偏向 SKT 胜于基线,尤其在未见主题上。
- 序列潜在方法更好地捕捉跨轮的主题转变与知识 grounding。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。