QUICK REVIEW

[論文レビュー] Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Atticus Geiger, Zhengxuan Wu|arXiv (Cornell University)|Mar 5, 2023

Explainable Artificial Intelligence (XAI)被引用数 9

ひとこと要約

論文は Distributed Alignment Search (DAS) を導入します。これは分散変換介入を用いて分散ニューラル表現と高レベル因果モデルをグラデットに整合させ、階層的等価性とNLIタスクで完璧またはほぼ完璧な介入介入精度（IIA）を達成します。

ABSTRACT

Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing causal abstraction methods have two major limitations: they require a brute-force search over alignments between the high-level model and the low-level one, and they presuppose that variables in the high-level model will align with disjoint sets of neurons in the low-level one. In this paper, we present distributed alignment search (DAS), which overcomes these limitations. In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations. Our experiments show that DAS can discover internal structure that prior approaches miss. Overall, DAS removes previous obstacles to conducting causal abstraction analyses and allows us to find conceptual structure in trained neural nets.

研究の動機と目的

ニューラルネットを説明する因果抽象化の動機付けと形式化。
勾配ベースの最適化を用いて brute-force 的な整合探索を克服する。
ローカリストで非重複なニューロン–変数マッピングの仮定を緩和し、分散表現を許容する。
DAS を、明確な高レベル解を持つタスク（階層的等価性とNLI）でデモンストレーションする。
DAS を brute-force の局所最適探索と比較し、分散表現の分解を分析する。

提案手法

高レベルモデルと低レベルネットの間のConstructive causal abstraction と interchange interventions (II) を定義する。
分散介入（DII）を導入し、表現を非標準基底へ回転させ、基底入力を保持し、回転成分を用いてパッチする。
直交回転を微分可能な行列でパラメータ化し、IIA を最大化するよう勾配法で最適化する。
Distributed Alignment Search (DAS) を正式化し、高レベル変数と回転したニューロン表現の部分空間整合を学習する。
最適化目的として、高レベル出力と回転低レベル出力の交差エントロピーを用いる。
ブリュートフォースのローカリスト整合と比較し、学習された分散表現が入力の同一性に分解されるかを分析する。

Figure 5: Rotation measured in degree(s) of eigenvectors of the learned rotation matrix for each task.

実験結果

リサーチクエスチョン

RQ1学習された分散表現の下で高レベルの因果モデルを低レベルのニューラルネットと忠実に整合させることができるか？
RQ2分散（非局所）なニューロン–変数マッピングを許すと、局所的整合より抽象度の高い精度が改善するか？
RQ3学習された分散表現は抽象的な関係を反映する程度と、分解可能な入力同一性の反映のどちらに寄るか？
RQ4DAS は階層的等価性のような明確なシンボリック解を持つタスクと、NLI のような意味論的タスクでどのように機能するか？

主な発見

DAS はローカリストのアプローチが見落とす内部構造を発見し、ブリュートフォースのローカリスト探索より高い介入介入精度を達成する。
階層的等価性では、DAS は複数の設定で完璧またはほぼ完璧な整合（高いIIA）を得て、ブリュートフォースとローカリストのベースラインを上回る。
自然言語推論タスクでは、DAS は含意関係を含む因果モデルへの完璧な整合を見出し、表現が純粋な関係ではなくデータ構造を符号化する場合を明らかにする。
NLI ケースでは、含意表現は離散の含意成分とは分離可能な二つの語の同一性表現に分解されるため、ケース固有の抽象挙動を示す。
ランダムネットワークでの実験は、大きな隠れ層表現が誤誘導を生み得ることを示しており、分散的で原理的な整合の必要性を強調する。
DAS は分散表現が訓練済みネットワークで記号的で木構造的アルゴリズムを忠実に実装できることを示す。

Figure 6: Accuracy over training epochs of the high-level model abstracting both equality relations for hierarchical equality experiment.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。