QUICK REVIEW

[论文解读] DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks

Zehui Lin, Pengfei Liu|arXiv (Cornell University)|Jul 25, 2019

Domain Adaptation and Few-Shot Learning参考文献 25被引用 34

一句话总结

DropAttention 为 Transformer 的全连接自注意力引入 dropout，丢弃注意力权重，以减少共适应并在跨任务上提升泛化能力。

ABSTRACT

Variants dropout methods have been designed for the fully-connected layer, convolutional layer and recurrent layer in neural networks, and shown to be effective to avoid overfitting. As an appealing alternative to recurrent and convolutional layers, the fully-connected self-attention layer surprisingly lacks a specific dropout method. This paper explores the possibility of regularizing the attention weights in Transformers to prevent different contextualized feature vectors from co-adaption. Experiments on a wide range of tasks show that DropAttention can improve performance and reduce overfitting.

研究动机与目标

说明需要一种针对 Transformer 自注意力的 dropout 变体。
提出 DropAttention（DropAttention(c) 与 DropAttention(e)）以正则化注意力权重。
探究在注意力 dropout 中丢弃连续区域和归一化重新缩放的好处。
在文本分类、序列标注、文本蕴涵和机器翻译等任务上评估 DropAttention。

提出的方法

将 self-attention 输出重新表述为￭H̃ = f(ΛV)，其中 Λ = softmax(QK^T / sqrt(d_k))，V 的计算来自 H。
引入两个 DropAttention 变体：DropAttention(c) 在注意力列（向量级）上进行丢弃，DropAttention(e) 在 Λ 中丢弃单个元素。
结合受 DropBlock 启发的连续区域丢弃，使用窗口大小 w 与丢弃率 p。
应用归一化重新缩放，在 dropout 后保持注意力权重和等于 1，以提高训练稳定性。
提供 DropAttention(e) 的伪代码，以及 DropAttention(c) 的类比过程。
在多个人工智能任务上评估以评估正则化效果及与标准 dropout 的互补性。

实验结果

研究问题

RQ1DropAttention 是否能提升泛化并在全连接自注意力网络中减少过拟合？
RQ2DropAttention(c) 与 DropAttention(e) 在性能和鲁棒性方面在各任务中的比较？
RQ3丢弃连续区域（窗口大小 w）对注意力分布和模型行为的影响？
RQ4归一化重新缩放是否优于传统 dropout 的重新缩放在注意力 dropout？
RQ5当与标准 dropout 一起使用时，DropAttention 如何相互作用？

主要发现

DropAttention 在文本分类、序列标注、文本蕴涵和机器翻译任务上提升了性能。
在 DropAttention 中，归一化重新缩放通常优于传统的重新缩放（1-p）。
在分类任务中，DropAttention(c) 往往比 DropAttention(e) 有更高的性能。
丢弃连续区域（更大的 w）和更高的丢弃率往往会增加注意力分布熵和头部多样性，从而增强鲁棒性。
DropAttention 可以补充标准 dropout，组合使用时可获得额外收益（Dropout + DropAttention）。
在大规模 MT（WMT'16 En-De）中，p=0.2，w=2 的 DropAttention 相对于基线实现显著的 BLEU 增益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。