QUICK REVIEW

[论文解读] TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue

Chien-Sheng Wu, Steven C. H. Hoi|arXiv (Cornell University)|Apr 15, 2020

Topic Modeling被引用 47

一句话总结

TOD-BERT 在九个任务导向对话语料上进行预训练，使用用户/系统令牌和响应对比目标，在四个下游任务上取得提升，并在少样本显著优于 BERT 及其他基线。

ABSTRACT

The underlying difference of linguistic patterns between general text and task-oriented dialogue makes existing pre-trained language models less useful in practice. In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling. We propose a contrastive objective function to simulate the response selection task. Our pre-trained task-oriented dialogue BERT (TOD-BERT) outperforms strong baselines like BERT on four downstream task-oriented dialogue applications, including intention recognition, dialogue state tracking, dialogue act prediction, and response selection. We also show that TOD-BERT has a stronger few-shot ability that can mitigate the data scarcity problem for task-oriented dialogue.

研究动机与目标

Motivate robust language understanding for task-oriented dialogue by addressing differences between conversational and general text.
Unify nine task-oriented dialogue datasets to pre-train a dialogue-focused BERT variant.
Incorporate user/system tokens and a response contrastive objective to capture dialogue structure.
Demonstrate TOD-BERT's improvements on core downstream tasks and its few-shot capabilities.

提出的方法

Extend BERT with two special tokens [USR] and [SYS] to model user and system utterances in dialogue sequences.
Pre-train with a joint objective of masked language modeling (MLM) and a response contrastive loss (RCL) to simulate response selection.
Use a dual-encoder setup for RCL, treating other responses in the batch as negatives and maximizing the correct context-response similarity.
Train TOD-BERT on 100k dialogues (1.4M utterances) across 60 domains from nine datasets, and initialize from BERT-base uncased.
Fine-tune TOD-BERT on downstream tasks with the same architecture and comparable hyperparameters for fair comparison.

实验结果

研究问题

RQ1Can task-oriented dialogue pre-training on unified dialogue corpora improve language understanding over generic pre-trained models like BERT?
RQ2Does incorporating user/system tokens and an explicit response selection objective yield better representations for dialogue tasks?
RQ3How does TOD-BERT perform in low-resource (few-shot) settings across key task-oriented dialogue tasks?
RQ4Is TOD-BERT beneficial across diverse downstream tasks such as intention recognition, DST, dialogue act prediction, and response selection?

主要发现

TOD-BERT outperforms BERT and baselines like GPT-2 and DialoGPT on four downstream tasks: intent recognition, dialogue state tracking, dialogue act prediction, and response selection.
Joint MLM and response contrastive learning (TOD-BERT-jnt) yields stronger representations than MLM-only TOD-BERT (TOD-BERT-mlm).
TOD-BERT shows notable few-shot gains, with substantial accuracy improvements in 1-shot and 10-shot settings on intent recognition and DST.
In probing, TOD-BERT-jnt achieves the highest linear-probe performance, suggesting richer task-relevant representations.
TOD-BERT provides strong cross-dataset and cross-domain performance advantages, with clear benefits in few-shot scenarios.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。