QUICK REVIEW

[論文レビュー] Task structure and nonlinearity jointly determine learned representational geometry

Matteo Alleman, Jack Lindsey|arXiv (Cornell University)|Jan 24, 2024

Neural Networks and Applications被引用数 7

ひとこと要約

この論文は活性化関数がネットワークの隠れ表現を入力と目標出力の幾何学にどのように合わせるかを示しており、tanhは目標に整列した、分離された表現を促進し、ReLUは入力の幾何を保つ。単純なタスクと複雑なタスクの両方で適用。

ABSTRACT

The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.

研究の動機と目的

入力の幾何、ラベルの幾何、そしてネットワークアーキテクチャが学習された表現にどのように影響するかを調査する。
1つの隠れ層を持つネットワークが異なる非線形性の下でどのように表現を学習するかを検証する。
整列、分離、一般化の指標を用いてタスク全体で表現の幾何を定量化する。
活性化関数の非対称性が学習ダイナミクスと表現構造に与える影響を評価する。

提案手法

入力-出力幾何を制御するために、バイナリ潜在変数を用いたパラメータ化された分類タスクの族を使用する。
凍結された第2層を持つネットワークを研究し、表現学習を分析するために第一層の重みのみを学習する。
入力-出力の整列とノイズレベルの変化に対してtanhとReLUの非線形性を比較する。
重み空間における学習ダイナミクスを、勾配をクラス間軸とクラス内軸に投影して分析する。
表現を特徴づける複数の指標を適用する: ターゲット整列、入力整列、カーネル整列、並列性スコア、そして条件を跨ぐ generalization性能(CCGP)。
一般性を検証するために多層ネットワークおよび畳み込みアーキテクチャへの分析を拡張する。）

実験結果

リサーチクエスチョン

RQ1活性化関数（tanh対ReLU）は、学習された表現とタスクの入力幾何と出力幾何の整列にどのような影響を与えるか？
RQ2入力-出力整列、ノイズ、タスクの複雑さが異なる非線形性の下で表現の幾何にどのように影響するか？
RQ3浅いネットワークで観察された効果は、深いネットワークや畳み込みアーキテクチャでも持続するか？
RQ4ターゲット整列表現と入力保持表現の出現を駆動する学習ダイナミクスは何か？
RQ5カーネル整列、並列性スコア、およびCCGPといった指標は、これらの幾何学的変化をどのように反映するか？

主な発見

Tanhネットワークは、ターゲット出力構造に整列した表現を学習する傾향があり、より高いターゲット整列、並列性、およびCCGPを示す。
ReLUネットワークは入力幾何の多くを保持し、未学習のラベリングに対しても高い入力整列とデコード能力を維持する。
tanhのもとでの勾配はクラス間軸の整列を促進しクラス内選択性を低減する一方、ReLUの勾配は重みを既存のクラス内選択性を増幅する方向へ押し進める。
入力-出力整列を高める（デルタが大きい）と、tanhネットワークではターゲット整列が、ReLUネットワークよりも強く向上する。
XOR様のタスクでは、入力が絡み合っていてもtanh表現は抽象的になる一方、ReLUは難易度の範囲にわたって入力駆動の構造を保持する。
活性化関数の対称的な飽和挙動（原点での挙動だけでなく）が、ターゲット整列表現へ強く偏らせる。原点周りの対称性は調整的だが決定的ではない。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。