[论文解读] TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue
TOD-BERT 在九个任务导向对话语料上进行预训练,使用用户/系统令牌和响应对比目标,在四个下游任务上取得提升,并在少样本显著优于 BERT 及其他基线。
The underlying difference of linguistic patterns between general text and task-oriented dialogue makes existing pre-trained language models less useful in practice. In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling. We propose a contrastive objective function to simulate the response selection task. Our pre-trained task-oriented dialogue BERT (TOD-BERT) outperforms strong baselines like BERT on four downstream task-oriented dialogue applications, including intention recognition, dialogue state tracking, dialogue act prediction, and response selection. We also show that TOD-BERT has a stronger few-shot ability that can mitigate the data scarcity problem for task-oriented dialogue.
研究动机与目标
- Motivate robust language understanding for task-oriented dialogue by addressing differences between conversational and general text.
- Unify nine task-oriented dialogue datasets to pre-train a dialogue-focused BERT variant.
- Incorporate user/system tokens and a response contrastive objective to capture dialogue structure.
- Demonstrate TOD-BERT's improvements on core downstream tasks and its few-shot capabilities.
提出的方法
- Extend BERT with two special tokens [USR] and [SYS] to model user and system utterances in dialogue sequences.
- Pre-train with a joint objective of masked language modeling (MLM) and a response contrastive loss (RCL) to simulate response selection.
- Use a dual-encoder setup for RCL, treating other responses in the batch as negatives and maximizing the correct context-response similarity.
- Train TOD-BERT on 100k dialogues (1.4M utterances) across 60 domains from nine datasets, and initialize from BERT-base uncased.
- Fine-tune TOD-BERT on downstream tasks with the same architecture and comparable hyperparameters for fair comparison.
实验结果
研究问题
- RQ1Can task-oriented dialogue pre-training on unified dialogue corpora improve language understanding over generic pre-trained models like BERT?
- RQ2Does incorporating user/system tokens and an explicit response selection objective yield better representations for dialogue tasks?
- RQ3How does TOD-BERT perform in low-resource (few-shot) settings across key task-oriented dialogue tasks?
- RQ4Is TOD-BERT beneficial across diverse downstream tasks such as intention recognition, DST, dialogue act prediction, and response selection?
主要发现
- TOD-BERT outperforms BERT and baselines like GPT-2 and DialoGPT on four downstream tasks: intent recognition, dialogue state tracking, dialogue act prediction, and response selection.
- Joint MLM and response contrastive learning (TOD-BERT-jnt) yields stronger representations than MLM-only TOD-BERT (TOD-BERT-mlm).
- TOD-BERT shows notable few-shot gains, with substantial accuracy improvements in 1-shot and 10-shot settings on intent recognition and DST.
- In probing, TOD-BERT-jnt achieves the highest linear-probe performance, suggesting richer task-relevant representations.
- TOD-BERT provides strong cross-dataset and cross-domain performance advantages, with clear benefits in few-shot scenarios.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。