QUICK REVIEW

[论文解读] How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Keyulu Xu, Mozhi Zhang|arXiv (Cornell University)|Sep 24, 2020

Domain Adaptation and Few-Shot Learning参考文献 90被引用 108

一句话总结

本文分析了通过梯度下降训练的神经网络在训练数据之外的外推行为，揭示 ReLU MLPs 在离原点的方向上收敛为线性函数，并且当任务特异性的非线性被编码在结构或特征中时，GNNs 可以外推。它给出基于NTK的理论结果并在不同DP任务上进行经验验证。

ABSTRACT

We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) -- structured networks with MLP modules -- have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently "diverse". Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.

研究动机与目标

量化通过梯度下降训练的神经网络在训练分布之外的外推能力。
解释为何 MLPs 在非线性外推方面表现乏力，而 GNNs 在 DP 类任务中可能成功。
识别在何种条件下 MLPs 与 GNNs 能良好地外推。
将前馈外推的洞见与 GNNs 的结构与表示联系起来。

提出的方法

在神经张量核（NTK）情形下分析过参数化网络，以将训练动力学与核回归联系起来。
证明两层 ReLU MLP 在离原点的方向上以速率 O(1/t) 的线性外推行为。
当训练分布覆盖足够多的方向（多样几何）时，MLPs 能较好地外推线性目标函数。
提出并验证一个假设：当任务特异性的非线性被编码在架构或输入表示中时，GNNs 能较好地外推（定理 3 及相关实验）。
利用 Graph NTK 分析简化 GNN 情况下的外推，并在最大度数、最短路径和 n-体任务上进行验证。
讨论架构（如最大/最小读出）和输入表示在实现外推中的作用。

实验结果

研究问题

RQ1ReLU MLPs 通过梯度下降训练在训练分布之外外推良好何时？
RQ2在什么条件下 GNNs 能外推非线性任务，以及架构和输入表示如何影响这一点？
RQ3将任务特异性的非线性编码到 GNNs 或表示中是否能实现对未见图大小、结构或边权的外推？
RQ4训练数据的几何形状如何影响线性目标在 MLPs 中的外推，以及在 DP 相关任务的 GNNs 中的外推？

主要发现

ReLU MLPs 在离原点的方向上外推为线性函数，速率为 O(1/t)。
当训练分布覆盖足够多的方向（几何多样性）时，MLP 可以外推线性目标函数。
当在架构或特征中编码了合适的非线性时，GNNs 在 DP 类任务中能够较好地外推，理论（Graph NTK）与实验均给予支持。
用与 DP 更新镜像的建筑（如最大/最小读取）替代求和聚合，可在最大度数和最短路径等任务中实现更好的外推。
通过改进输入表示，将非线性转移到表示中而非 MLPs，可以使 GNNs 在外推非线性动力学（如 n-体问题）方面获得提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。