QUICK REVIEW

[論文レビュー] Dice Loss for Data-imbalanced NLP Tasks

Xiaoya Li, Xiaofei Sun|arXiv (Cornell University)|Nov 7, 2019

Topic Modeling参考文献 58被引用数 32

ひとこと要約

本論文は dice loss と dynamic weighting scheme（adaptive dice loss）を導入し、NLPタスクにおけるデータ不均衡に対応。POS tagging、NER、MRC、 paraphrase identification で顕著な改善を示し、いくつかのデータセットで最先端の結果を達成。

ABSTRACT

Many NLP tasks such as tagging and machine reading comprehension are faced with the severe data imbalance issue: negative examples significantly outnumber positive examples, and the huge number of background examples (or easy-negative examples) overwhelms the training. The most commonly used cross entropy (CE) criteria is actually an accuracy-oriented objective, and thus creates a discrepancy between training and test: at training time, each training instance contributes equally to the objective function, while at test time F1 score concerns more about positive examples. In this paper, we propose to use dice loss in replacement of the standard cross-entropy objective for data-imbalanced NLP tasks. Dice loss is based on the Sorensen-Dice coefficient or Tversky index, which attaches similar importance to false positives and false negatives, and is more immune to the data-imbalance issue. To further alleviate the dominating influence from easy-negative examples in training, we propose to associate training examples with dynamically adjusted weights to deemphasize easy-negative examples.Theoretical analysis shows that this strategy narrows down the gap between the F1 score in evaluation and the dice loss in training. With the proposed training objective, we observe significant performance boost on a wide range of data imbalanced NLP tasks. Notably, we are able to achieve SOTA results on CTB5, CTB6 and UD1.4 for the part of speech tagging task; SOTA results on CoNLL03, OntoNotes5.0, MSRA and OntoNotes4.0 for the named entity recognition task; along with competitive results on the tasks of machine reading comprehension and paraphrase identification.

研究の動機と目的

NLPタスクにおける否定例が陽性例を大きく上回る深刻なデータ不均衡の動機付け。
F1評価に合わせてトレーニングをCEではなくdiceベースのlossに置換する提案。
トレーニング中のeasy-negativeの優勢を抑制するデータ依存の動的ウェイティング機構を導入。
POS tagging、NER、MRC、PIデータセット全体での広範な経験的利得を示す。

提案手法

Sørensen–Dice係数に基づくDice Loss（DL）をCE lossの置換として定義。
分母を2乗したバージョンを用いるDice Lossの変種を導入し、収束を早める（Milletari et al. の形）。
ЩをTversky Indexと対応するTversky Lossへ拡張し、精度-再現率のトレードオフを制御。
soft確率にデカイダウンフォース（1−p)^αを乗じてeasy negativesを抑制する自己適応的Dice Loss（adaptive Dice Loss）を提案。
Dice Lossを focal lossと関連付け、トレーニング中に難易度の高いネガティブを強調する様子を示す。
POS tagging、NER、MRC、PIを対象に、BERT、XLNetなどの様々なバックボーンで評価し、CE/MLEベースラインより改善を報告。

実験結果

リサーチクエスチョン

RQ1diceベースのlossは、CEと比較して不均衡なNLPデータセットで学習を改善できるか。
RQ2dynamicなトレーニング例のウェイティングはeasy-negativeサンプルの支配を緩和できるか。
RQ3Dice Loss、Tversky Loss、およびそれらの適応変種はPOS tagging、NER、MRC、PIタスクでどのように性能を示すか。
RQ4TIのハイパーパラメータ（α、β）のタスク性能への影響はどのようになるか。
RQ5SST-2/ SST-5 のような正確さ重視タスクではdiceベースlossはCEより劣る可能性があるのか。

主な発見

diceベースのlossはCE/MLEベースラインと比較して複数のNLPタスクで顕著な性能向上をもたらす。
Adaptive Dice Loss（(1−p)^αウェイティング）によりeasy negativesの影響を抑え、トレーニング信号とF1の整合を改善。
POS taggingではDSCがCTB5、CTB6、UD1.4データセットでSOTAを達成。
NERではDSCがCoNLL2003、OntoNotes5.0、MSRA、OntoNotes4.0データセットでSOTAを達成。
MRC（SQuAD v1.1/v2.0、Quoref）および PI（MRPC/QQP）では、DSCは強力なベースライン（例：BERT XLNetバックボーン）に対してEM/F1スコアを一貫して改善。
TI（α, β）のハイパーパラメータは結果に大きく影響し、最適なα値はデータセットによって異なる（例：α=0.6 Chinese OntoNotes4.0 NER、α=0.4 QuoRef MRC）。
SST-2 および SST-5 の感情分類結果はdice lossが精度指向ではなく、これらのタスクではCEより劣る可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。