Skip to main content
QUICK REVIEW

[论文解读] TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue

Chien-Sheng Wu, Steven C. H. Hoi|arXiv (Cornell University)|Apr 15, 2020
Topic Modeling被引用 47
一句话总结

TOD-BERT 在九个任务导向对话语料上进行预训练,使用用户/系统令牌和响应对比目标,在四个下游任务上取得提升,并在少样本显著优于 BERT 及其他基线。

ABSTRACT

The underlying difference of linguistic patterns between general text and task-oriented dialogue makes existing pre-trained language models less useful in practice. In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling. We propose a contrastive objective function to simulate the response selection task. Our pre-trained task-oriented dialogue BERT (TOD-BERT) outperforms strong baselines like BERT on four downstream task-oriented dialogue applications, including intention recognition, dialogue state tracking, dialogue act prediction, and response selection. We also show that TOD-BERT has a stronger few-shot ability that can mitigate the data scarcity problem for task-oriented dialogue.

研究动机与目标

  • Motivate robust language understanding for task-oriented dialogue by addressing differences between conversational and general text.
  • Unify nine task-oriented dialogue datasets to pre-train a dialogue-focused BERT variant.
  • Incorporate user/system tokens and a response contrastive objective to capture dialogue structure.
  • Demonstrate TOD-BERT's improvements on core downstream tasks and its few-shot capabilities.

提出的方法

  • Extend BERT with two special tokens [USR] and [SYS] to model user and system utterances in dialogue sequences.
  • Pre-train with a joint objective of masked language modeling (MLM) and a response contrastive loss (RCL) to simulate response selection.
  • Use a dual-encoder setup for RCL, treating other responses in the batch as negatives and maximizing the correct context-response similarity.
  • Train TOD-BERT on 100k dialogues (1.4M utterances) across 60 domains from nine datasets, and initialize from BERT-base uncased.
  • Fine-tune TOD-BERT on downstream tasks with the same architecture and comparable hyperparameters for fair comparison.

实验结果

研究问题

  • RQ1Can task-oriented dialogue pre-training on unified dialogue corpora improve language understanding over generic pre-trained models like BERT?
  • RQ2Does incorporating user/system tokens and an explicit response selection objective yield better representations for dialogue tasks?
  • RQ3How does TOD-BERT perform in low-resource (few-shot) settings across key task-oriented dialogue tasks?
  • RQ4Is TOD-BERT beneficial across diverse downstream tasks such as intention recognition, DST, dialogue act prediction, and response selection?

主要发现

  • TOD-BERT outperforms BERT and baselines like GPT-2 and DialoGPT on four downstream tasks: intent recognition, dialogue state tracking, dialogue act prediction, and response selection.
  • Joint MLM and response contrastive learning (TOD-BERT-jnt) yields stronger representations than MLM-only TOD-BERT (TOD-BERT-mlm).
  • TOD-BERT shows notable few-shot gains, with substantial accuracy improvements in 1-shot and 10-shot settings on intent recognition and DST.
  • In probing, TOD-BERT-jnt achieves the highest linear-probe performance, suggesting richer task-relevant representations.
  • TOD-BERT provides strong cross-dataset and cross-domain performance advantages, with clear benefits in few-shot scenarios.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。