QUICK REVIEW

[论文解读] Tensor Programs II: Neural Tangent Kernel for Any Architecture

Greg Yang|arXiv (Cornell University)|Jun 25, 2020

Stochastic Gradient Optimization Techniques参考文献 59被引用 53

一句话总结

本工作证明神经切线核（NTK）在无限宽度极限下对任意神经网络架构都确定性收敛，并阐明梯度独立性假设何时成立，同时给出基于张量程序的实用方法及参考实现。

ABSTRACT

We prove that a randomly initialized neural network of *any architecture* has its Tangent Kernel (NTK) converge to a deterministic limit, as the network widths tend to infinity. We demonstrate how to calculate this limit. In prior literature, the heuristic study of neural network gradients often assumes every weight matrix used in forward propagation is independent from its transpose used in backpropagation (Schoenholz et al. 2017). This is known as the *gradient independence assumption (GIA)*. We identify a commonly satisfied condition, which we call *Simple GIA Check*, such that the NTK limit calculation based on GIA is correct. Conversely, when Simple GIA Check fails, we show GIA can result in wrong answers. Our material here presents the NTK results of Yang (2019a) in a friendly manner and showcases the *tensor programs* technique for understanding wide neural networks. We provide reference implementations of infinite-width NTKs of recurrent neural network, transformer, and batch normalization at https://github.com/thegregyang/NTK4A.

研究动机与目标

证明任意架构的随机初始化网络在宽度无界增长时具有确定性 NTK 极限。
识别在何种条件下梯度独立性假设能给出正确的 NTK 极限（Simple GIA Check）。
当 Simple GIA Check 不成立时，展示梯度独立性的失败并提供正确的 NTK 计算指南。
通过张量程序对宽网络的 NTK 极限给出友好的阐释。
提供循环网络、变换器和批归一化的无限宽 NTK 的参考实现。

提出的方法

将张量程序框架应用于分析宽度较大的神经网络并推导 NTK 极限。
形式化梯度独立性假设（GIA），并确立 Simple GIA Check 作为正确 NTK 计算的条件。
证明 NTK 在宽度趋向无穷时对任意架构收敛到确定性极限。
推导并给出循环神经网络、变换器和批归一化的 NTK 极限。
提供计算无限宽 NTK 的实用指导和参考实现。

实验结果

研究问题

RQ1在无限宽极限下，NTK 是否对任意神经网络架构收敛到确定性极限？
RQ2在何种条件下梯度独立性假设对 NTK 计算有效（Simple GIA Check）？
RQ3如何为具体架构（如 RNN、Transformer 和批归一化）计算 NTK 极限？
RQ4GIA 的失败模式有哪些，它们如何影响 NTK 结果？
RQ5张量程序是否能为理解宽网络中的 NTK 极限提供友好、可操作的框架？

主要发现

NTK 在网络宽度趋于无穷时对任意架构收敛到确定性极限。
Simple GIA Check 确定了 GIA 基于 NTK 计算正确的条件。
当 Simple GIA Check 失败时，GIA 可能导致不正确的 NTK 结果。
本文使用张量程序对 NTK 结果进行了友好阐释。
提供了 RNN、Transformer 和批归一化的无限宽 NTK 的参考实现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。