QUICK REVIEW

[論文レビュー] Self-Explaining Structures Improve NLP Models

Zijun Sun, Chun Chieh Fan|arXiv (Cornell University)|Dec 3, 2020

Topic Modeling参考文献 75被引用数 25

ひとこと要約

本論文は、任意の既存モデルの上に解釈層を追加することで、NLPモデルの解釈可能性と性能を向上させる自己説明型ニューラルネットワークフレームワークを提案する。この層は、すべてのテキストスパン（例：フレーズ、文）に学習可能な重みを割り当て、外部のプローブモデルを用いずに直接的かつ高水準の重要度スコアを生成可能であり、SST-5で59.1、SNLIで92.3という新たなSOTA結果を達成した。

ABSTRACT

Existing approaches to explaining deep learning models in NLP usually suffer from two major drawbacks: (1) the main model and the explaining model are decoupled: an additional probing or surrogate model is used to interpret an existing model, and thus existing explaining tools are not self-explainable; (2) the probing model is only able to explain a model's predictions by operating on low-level features by computing saliency scores for individual words but are clumsy at high-level text units such as phrases, sentences, or paragraphs. To deal with these two issues, in this paper, we propose a simple yet general and effective self-explaining framework for deep learning models in NLP. The key point of the proposed framework is to put an additional layer, as is called by the interpretation layer, on top of any existing NLP model. This layer aggregates the information for each text span, which is then associated with a specific weight, and their weighted combination is fed to the softmax function for the final prediction. The proposed model comes with the following merits: (1) span weights make the model self-explainable and do not require an additional probing model for interpretation; (2) the proposed model is general and can be adapted to any existing deep learning structures in NLP; (3) the weight associated with each text span provides direct importance scores for higher-level text units such as phrases and sentences. We for the first time show that interpretability does not come at the cost of performance: a neural model of self-explaining features obtains better performances than its counterpart without the self-explaining nature, achieving a new SOTA performance of 59.1 on SST-5 and a new SOTA performance of 92.3 on SNLI.

研究の動機と目的

既存のNLPモデルには自己説明性が欠如しており、解釈には別個のプローブモデルや代替モデルに依存しているという問題に対処する。
語彙レベルの重要度手法では、フレーズや文といった高水準のテキスト単位における意味的構成を捉えきれないという制限を克服する。
汎用性の高いフレームワークを構築し、モデル性能の向上と同時に、スパンレベルでの正確で解釈可能な説明を可能にする。
解釈可能性と性能が互いに排他的であるのではなく、アーキテクチャ設計によって同時に向上させられることを示す。

提案手法

任意の事前学習済みNLPモデルの上に解釈層を導入し、すべての可能なテキストスパン（O(n²)スパン）に対して注目重みを計算する。
各テキストスパンは、最終予測への寄与度を反映する学習可能な重みに関連付けられ、直接的な解釈が可能になる。
スパン表現の重み付き和をソフトマックス層に通して最終分類を行うことで、解釈を予測パスに統合する。
解釈層はメインモデルと同時に端末から学習され、別個のプローブモデルの必要性がなくなる。
スパンレベルの注目重みを活用して、フレーズ、文、または段落の重要度スコアを生成し、高水準の解釈可能性を実現する。
敵対的例生成にフレームワークを応用し、最も重要なスパンを言い換え文に置き換えることで、効果的な攻撃を達成する。

実験結果

リサーチクエスチョン

RQ1外部のプローブモデルに依存せずに、自己説明型NLPモデルを設計できるか？
RQ2フレーズや文レベルでの解釈は、語彙レベルの重要度手法よりも効果的に達成できるか？
RQ3自己説明メカニズムを組み込むと、モデル性能が低下するか、向上するか？
RQ4スパンレベルの注目メカニズムは、NLPにおけるより効果的な敵対的例の生成に利用できるか？
RQ5自己説明型モデルは、無関係な節に注目する、感情のシフトを検出できない、比喩や皮肉を誤解するといった予測の失敗パターンをどのように明らかにするか？

主な発見

提案された自己説明型フレームワークは、SST-5センチメント分類ベンチマークで59.1という新たなSOTA性能を達成した。
SNLI自然言語推論データセットでは92.3という新たなSOTA結果を達成し、一般化性能の向上を示した。
モデルの解釈層は、フレーズや文に対して直接的かつ高水準の重要度スコアを提供し、語彙レベル手法よりも明確なエラー分析が可能になった。
フレームワークにより、最も重要なスパンを言い換え文に置き換えることで、IMDBでは精度を84%、Yahoo! Answersでは48.86%低下させる効果的な敵対的例生成が可能になった。
エラー分析の結果、モデルはしばしば対比構造における無関係な節に注目し、感情の変化を検出できず、皮肉や比喩を誤解する傾向があることが明らかになった。
自己説明メカニズムは性能を損なわず、むしろ向上させることを示した。これは、解釈可能性と正確性がNLPモデルにおいて共存可能であることを証明した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。