QUICK REVIEW

[论文解读] Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems

Andrea Madotto, Zihan Liu|arXiv (Cornell University)|Aug 14, 2020

Topic Modeling参考文献 22被引用 36

一句话总结

本论文研究通过少量示例对大型语言模型（GPT-2 变体）进行提示，使其在任务型对话中无需参数更新即可解决 NLU、DST、对话策略（Dialogue Policy）和 NLG 任务，并与微调基线进行比较，同时概述局限性。

ABSTRACT

Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG). A research challenge is to learn each module with the least amount of samples (i.e., few-shots) given the high cost related to the data collection. The most common and effective technique to solve this problem is transfer learning, where large language models, either pre-trained on text or task-specific data, are fine-tuned on the few samples. These methods require fine-tuning steps and a set of parameters for each task. Differently, language models, such as GPT-2 (Radford et al., 2019) and GPT-3 (Brown et al., 2020), allow few-shot learning by priming the model with few examples. In this paper, we evaluate the priming few-shot ability of language models in the NLU, DST, DP and NLG tasks. Importantly, we highlight the current limitations of this approach, and we discuss the possible implication for future work.

研究动机与目标

推动减少分模块任务型对话系统（NLU、DST、DP、NLG）的数据收集。
评估语言模型提示作为无微调的少量样本方法在核心任务上的应用。
在数据有限条件下，将 LM 提示的少量样本结果与微调基线进行比较。
识别实际局限，并提出针对更长上下文模型及未来工作的方向。

提出的方法

使用三种前缀风格（二进制、基于值、生成式）对语言模型进行提示，以在不更新参数的情况下实现少量学习。
应用前缀将输入映射到 NLU（槽位填充和意图）、DST、ACT 和 NLG 任务的输出。
在标准数据集上评估（SNIPS 用于 NLU 的槽位填充和意图；MultiWOZ 用于 DST 和 ACT；FewShotWOZ 用于 NLG）。
将 LM 提示的少量样本结果与选定的微调基线进行比较（例如 TOD-BERT、BERT、SC-GPT 及其变体）。
在上下文窗口约束下，尝试不同的 GPT-2 模型尺寸（SMALL、LARGE、XL）。

实验结果

研究问题

RQ1在 NLU、DST、ACT 和 NLG 方面，LM 提示能否在少量样本下与微调基线相比拟出具有竞争力的性能？
RQ2模型大小如何影响各任务的少量样本性能？
RQ3就前缀设计、样本数量和输入长度而言，LM 提示的实际局限性有哪些？
RQ4哪些未来改进可以提升面向任务的对话系统的少量样本能力？

主要发现

使用更大 GPT-2 模型进行 LM 提示通常在 NLU 和 NLG 任务上得到更好的性能。
对于 DST 和 ACT，较大的 XL 模型并未始终优于 LARGE 模型，表明前缀设计或上下文效应很关键。
在 NLU、ACT 和 NLG 中，在有限样本下，LM 提示可以达到与最弱微调基线相当或更好的结果。
识别出两个主要局限性：(i) 二进制/基于值的前缀需要对每个类别/槽进行多次前向计算；(ii) GPT-2 的 1024 词元输入上限限制了可用的样本数。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。