[论文解读] Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs
Contextual Decomposition (CD) 通过将输出分解为短语特定贡献和上下文相关贡献来解释单个 LSTM 预测,捕捉超出简单词级重要性的单词交互。它在情感分析任务上验证 CD,揭示否定与组合效应。
The driving force behind the recent success of LSTMs has been their ability to learn complex and non-linear relationships. Consequently, our inability to describe these relationships has led to LSTMs being characterized as black boxes. To this end, we introduce contextual decomposition (CD), an interpretation algorithm for analysing individual predictions made by standard LSTMs, without any changes to the underlying model. By decomposing the output of a LSTM, CD captures the contributions of combinations of words or variables to the final prediction of an LSTM. On the task of sentiment analysis with the Yelp and SST data sets, we show that CD is able to reliably identify words and phrases of contrasting sentiment, and how they are combined to yield the LSTM's final prediction. Using the phrase-level labels in SST, we also demonstrate that CD is able to successfully extract positive and negative negations from an LSTM, something which has not previously been done.
研究动机与目标
- Motivate the need to interpret LSTMs beyond unigram importance in NLP.
- Propose Contextual Decomposition (CD) to decompose LSTM outputs into phrase-specific and context-driven contributions.
- Demonstrate that CD reveals interactions and negation effects in sentiment analysis tasks.
- Compare CD against existing interpretation baselines and show improvements in capturing compositional sentiment.
提出的方法
- Introduce Contextual Decomposition (CD) to decompose h_t and c_t into phrase-only (beta) and context-involving (gamma) contributions (Equations 8–9).
- Linearly approximate gates (i_t, f_t, g_t) and activations to identify cross-terms as interactions between phrase and context (Equations 11–18).
- Compute a softmax input as W beta_T + W gamma_T to quantify the phrase’s contribution to the final prediction (Equation 10).
- Provide a general recursion for beta_t and gamma_t updates across time steps and inside/outside the phrase (Appendix 6.2).
- Describe the linearization of activation functions L_sigma and L_tanh via averaging over orderings of inputs (Section 3.2.2, Equations 25–28).
- Compare CD to baselines (cell decomposition, integrated gradients, leave-one-out, gradient × input) on SST and Yelp datasets.
实验结果
研究问题
- RQ1Can CD produce reliable phrase- and interaction-level contributions for LSTM predictions?
- RQ2Do CD-derived scores reveal compositional sentiment, including subphrase interactions and negation?
- RQ3How does CD compare with existing interpretation methods in word-level and phrase-level explanations?
- RQ4Can CD extract meaningful embeddings for phrases/interactions that align with semantic similarity?
主要发现
- CD yields word-level scores that correlate well with logistic-regression coefficients, outperforming several baselines in SST and Yelp datasets.
- CD identifies dissenting subphrases within positive/negative phrases (e.g., positive phrases containing negative subphrases) where prior methods fail.
- CD captures compositional sentiment across phrases, distinguishing sentiment of large review portions better than alternative methods.
- CD uncovers clear negation interactions, separating positive and negative negations in SST data.
- CD provides dense phrase/interactions embeddings (beta_T) whose nearest neighbors align with semantic intuition for negation and modification.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。