QUICK REVIEW

[论文解读] Task structure and nonlinearity jointly determine learned representational geometry

Matteo Alleman, Jack Lindsey|arXiv (Cornell University)|Jan 24, 2024

Neural Networks and Applications被引用 7

一句话总结

本文表明激活函数形状影响网络隐藏表示与输入几何和目标输出几何的对齐，其中 tanh 倾向于促进目标对齐、解耦的表示，ReLU 则保留输入几何，在简单与复杂任务中均成立。

ABSTRACT

The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.

研究动机与目标

研究输入几何、标签几何和网络结构如何塑造学习到的表示。
考察单隐藏层网络在不同非线性下如何学习表示。
使用对齐、解耦和泛化指标在不同任务中量化表示几何。
评估激活函数不对称性对学习动力学和表示结构的影响。

提出的方法

使用具有二元潜变量的参数化分类任务族来控制输入-输出几何。
研究第二层冻结，只训练第一层权重以分析表示学习。
在不同输入-输出对齐和噪声水平下比较 tanh 和 ReLU 非线性。
通过将梯度投影到类间与类内轴来分析权重空间的学习动力学。
应用多种指标来表征表示：目标对齐、输入对齐、核对齐、并行性分数，以及跨条件泛化性能（CCGP）。
扩展分析至多层网络和卷积结构以测试普遍性。

实验结果

研究问题

RQ1激活函数（tanh 与 ReLU）如何影响学习到的表示与任务输入几何和输出几何之间的对齐？
RQ2在不同非线性下，输入-输出对齐、噪声和任务复杂度如何影响表示几何？
RQ3在较深网络和卷积结构中，浅层网络中观察到的效应是否仍然成立？
RQ4哪些学习动力学驱动目标对齐与输入保留表示的出现？
RQ5像核对齐、并行性分数和 CCGP 之类的度量如何反映这些几何变化？

主要发现

Tanh 网络倾向于学习与目标输出结构对齐的表示，显示更高的目标对齐、并行性和 CCGP。
ReLU 网络保留更多输入几何，维持较高的输入对齐和对于未训练标签的解码能力。
tanh 下的梯度促进类间轴对齐并降低类内选择性，而 ReLU 梯度推动权重放大已存在的类内选择性。
提高输入-输出对齐（更高的 delta）在 tanh 网络中对目标对齐的提升大于在 ReLU 网络。
对于 XOR 风格的任务，即使输入纠缠，tanh 表现也会变得抽象，而 ReLU 在一系列难度下保持输入驱动的结构。
激活函数的对称饱和行为（不仅仅是原点行为）强烈偏向目标对齐的表示；原点对称性起到调节作用但并非决定性因素。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。