QUICK REVIEW

[论文解读] DIET: Lightweight Language Understanding for Dialogue Systems

Tanja Bunk, Daksh Varshneya|arXiv (Cornell University)|Apr 21, 2020

Topic Modeling参考文献 33被引用 113

一句话总结

DIET 引入一个多任务的 Dual Intent and Entity Transformer 架构，用于对话系统中的意图分类和实体识别的联合任务，即使没有预训练嵌入也显示出强结果，并且训练速度比像 BERT 这类大型模型更快。

ABSTRACT

Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like distributed representations (GloVe) and purely supervised approaches. We introduce the Dual Intent and Entity Transformer (DIET) architecture, and study the effectiveness of different pre-trained representations on intent and entity prediction, two common dialogue language understanding tasks. DIET advances the state of the art on a complex multi-domain NLU dataset and achieves similarly high performance on other simpler datasets. Surprisingly, we show that there is no clear benefit to using large pre-trained models for this task, and in fact DIET improves upon the current state of the art even in a purely supervised setup without any pre-trained embeddings. Our best performing model outperforms fine-tuning BERT and is about six times faster to train.

研究动机与目标

Motivate the need for fast, multilingual, and trainable NLU for dialogue systems in real-world software ecosystems.
Propose a modular multi-task architecture that jointly handles intent classification and entity recognition.
Explore the impact of sparse (one-hot, character n-grams) and dense (pre-trained embeddings) features in DIET.
Investigate the benefits of incorporating masked reconstruction objectives as regularizers.
Evaluate DIET on multi-domain NLU benchmarks and compare against state-of-the-art baselines.

提出的方法

DIET 将输入特征化为以稀疏特征（token-level one-hot、character n-grams 至多长度 5）为增量、并包含来自预训练嵌入（ConveRT、BERT、GloVe）的密集特征的序列。
一个带相对位置注意力的两层 Transformer 对上下文进行编码，处理拼接后的 dense+sparse 特征。
在 Transformer 输出之上放置一个 CRF 层，用于执行命名实体识别。
意图分类使用对序列的 CLS 表示以及意图标签之间的语义空间上的点积损失，并进行负采样以进行排序。
向 Transformer 输出添加一个掩码令牌重建目标，以正则化并学习通用特征。
总损失是意图损失、实体（CRF）损失和掩码重建损失的加权和，以实现灵活的消融分析。

实验结果

研究问题

RQ1Can DIET jointly model intent classification and entity recognition effectively in a multi-domain setting?
RQ2What is the impact of combining sparse features with various pre-trained dense embeddings on NLU performance?
RQ3Does a masked reconstruction objective improve DIET’s generalization and accuracy?
RQ4How does DIET compare to state-of-the-art approaches like HERMIT and fine-tuned BERT across standard NLU benchmarks?
RQ5Is a purely supervised DIET model competitive with models leveraging large pre-trained language models, and how fast is training?

主要发现

On the challenging NLU-Benchmark, DIET with sparse features plus ConveRT embeddings achieves strong intent and entity F1 scores, outperforming the HERMIT baseline on intents and achieving higher entity recall.
A model using sparse features with ConveRT (no mask loss) yields top performance for intents and competitive results for entities, surpassing state-of-the-art by about 3 percentage points in F1 on both tasks.
In ablations, using only sparse features with a mask loss improves both intents and entities by about 1 percentage point; GloVe embeddings with sparse features are competitive, and BERT embeddings without task-specific fine-tuning may underperform compared to ConveRT or GloVe in this setup.
DIET with frozen ConveRT embeddings and sparse features outperforms fine-tuned BERT in entity recognition, while matching intent accuracy, and is significantly faster to train (10 hours vs. 60 hours on NLU-Benchmark).
On ATIS and SNIPS, DIET with sparse features and ConveRT or GloVe achieves competitive results close to Joint BERT, even with no fine-tuning of embeddings.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。