Skip to main content
QUICK REVIEW

[论文解读] DIET: Lightweight Language Understanding for Dialogue Systems

Tanja Bunk, Daksh Varshneya|arXiv (Cornell University)|Apr 21, 2020
Topic Modeling参考文献 33被引用 113
一句话总结

DIET 引入一个多任务的 Dual Intent and Entity Transformer 架构,用于对话系统中的意图分类和实体识别的联合任务,即使没有预训练嵌入也显示出强结果,并且训练速度比像 BERT 这类大型模型更快。

ABSTRACT

Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like distributed representations (GloVe) and purely supervised approaches. We introduce the Dual Intent and Entity Transformer (DIET) architecture, and study the effectiveness of different pre-trained representations on intent and entity prediction, two common dialogue language understanding tasks. DIET advances the state of the art on a complex multi-domain NLU dataset and achieves similarly high performance on other simpler datasets. Surprisingly, we show that there is no clear benefit to using large pre-trained models for this task, and in fact DIET improves upon the current state of the art even in a purely supervised setup without any pre-trained embeddings. Our best performing model outperforms fine-tuning BERT and is about six times faster to train.

研究动机与目标

  • Motivate the need for fast, multilingual, and trainable NLU for dialogue systems in real-world software ecosystems.
  • Propose a modular multi-task architecture that jointly handles intent classification and entity recognition.
  • Explore the impact of sparse (one-hot, character n-grams) and dense (pre-trained embeddings) features in DIET.
  • Investigate the benefits of incorporating masked reconstruction objectives as regularizers.
  • Evaluate DIET on multi-domain NLU benchmarks and compare against state-of-the-art baselines.

提出的方法

  • DIET 将输入 特征化 为 以 稀疏特征(token-level one-hot、character n-grams 至多长度 5)为增量、并包含来自预训练嵌入(ConveRT、BERT、GloVe)的密集特征的序列。
  • 一个带相对位置注意力的两层 Transformer 对上下文进行编码,处理 拼接后的 dense+sparse 特征。
  • 在 Transformer 输出之上放置一个 CRF 层,用于执行命名实体识别。
  • 意图分类使用对序列的 CLS 表示以及意图标签之间的语义空间上的点积损失,并进行负采样以进行排序。
  • 向 Transformer 输出添加一个掩码令牌重建目标,以正则化并学习通用特征。
  • 总损失是意图损失、实体(CRF)损失和掩码重建损失的加权和,以实现灵活的消融分析。

实验结果

研究问题

  • RQ1Can DIET jointly model intent classification and entity recognition effectively in a multi-domain setting?
  • RQ2What is the impact of combining sparse features with various pre-trained dense embeddings on NLU performance?
  • RQ3Does a masked reconstruction objective improve DIET’s generalization and accuracy?
  • RQ4How does DIET compare to state-of-the-art approaches like HERMIT and fine-tuned BERT across standard NLU benchmarks?
  • RQ5Is a purely supervised DIET model competitive with models leveraging large pre-trained language models, and how fast is training?

主要发现

  • On the challenging NLU-Benchmark, DIET with sparse features plus ConveRT embeddings achieves strong intent and entity F1 scores, outperforming the HERMIT baseline on intents and achieving higher entity recall.
  • A model using sparse features with ConveRT (no mask loss) yields top performance for intents and competitive results for entities, surpassing state-of-the-art by about 3 percentage points in F1 on both tasks.
  • In ablations, using only sparse features with a mask loss improves both intents and entities by about 1 percentage point; GloVe embeddings with sparse features are competitive, and BERT embeddings without task-specific fine-tuning may underperform compared to ConveRT or GloVe in this setup.
  • DIET with frozen ConveRT embeddings and sparse features outperforms fine-tuned BERT in entity recognition, while matching intent accuracy, and is significantly faster to train (10 hours vs. 60 hours on NLU-Benchmark).
  • On ATIS and SNIPS, DIET with sparse features and ConveRT or GloVe achieves competitive results close to Joint BERT, even with no fine-tuning of embeddings.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。