QUICK REVIEW

[論文レビュー] On Exact Computation with an Infinitely Wide Neural Net

Sanjeev Arora, Simon S. Du|arXiv (Cornell University)|Apr 26, 2019

Gaussian Processes and Bayesian Inference参考文献 36被引用数 61

ひとこと要約

この論文は、CNNのCNTKを正確に計算するGPU対応のアルゴリズムを提示し、有限幅の広いネットがCNTKカーネル回帰へ収束することを証明し、CNTKがCIFAR-10で高い性能を達成することを示します。

ABSTRACT

How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its width --- namely, number of channels in convolutional layers, and number of nodes in fully-connected internal layers --- is allowed to increase to infinity? Such questions have come to the forefront in the quest to theoretically understand deep learning and its mysteries about optimization and generalization. They also connect deep learning to notions such as Gaussian processes and kernels. A recent paper [Jacot et al., 2018] introduced the Neural Tangent Kernel (NTK) which captures the behavior of fully-connected deep nets in the infinite width limit trained by gradient descent; this object was implicit in some other recent papers. An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width. The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which we call Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm. This results in a significant new benchmark for the performance of a pure kernel-based method on CIFAR-10, being $10\%$ higher than the methods reported in [Novak et al., 2019], and only $6\%$ lower than the performance of the corresponding finite deep net architecture (once batch normalization, etc. are turned off). Theoretically, we also give the first non-asymptotic proof showing that a fully-trained sufficiently wide net is indeed equivalent to the kernel regression predictor using NTK.

研究の動機と目的

無限に広いCNNがCIFAR-10のような標準データセットでどのように性能を発揮するかを理解する動機付け。
CNN用のConvolutional Neural Tangent Kernel (CNTK)を正確かつ効率的に計算するアルゴリズムを開発する。
完全に学習済みの広いネットとCNTKを用いたカーネル回帰との等価性を示す。
非漸近的収束結果を提供し、CNTKの性能を有限幅ネットと比較する。
GPU実装とベンチマークを実用的に提供し、ディープラーニングのカーネルベース理解を進展させる。

提案手法

無限幅の極限を持つニューラルネットワークアーキテクチャを定義し、CNTKをパラメータに対する出力の勾配から導かれるカーネルとして記述する。
畳み込みとプーリングのステップを含む、通常のCNNとグローバル平均プーリング(GAP)を備えたCNNの明示的なCNTK公式を導出する。
非漸近的収束を証明する：初期化時のNTK収束を保証するために、最小レイヤ幅は Omega(L^6/epsilon^4 log(L/delta)) にスケールする。
完全に学習済みの広いネットとNTKに基づくカーネル回帰との等価性を、有限幅の摂動境界を用いて証明する（定理3.2）。
CNTKを正確に計算するための厳密な動的計画法ベースのアルゴリズムを提示し、GPU上での実装を最適化する。

実験結果

リサーチクエスチョン

RQ1プーリングを含む畳み込みネットワークに対してCNTKを厳密に計算できるか？
RQ2完全に学習済みの無限に広いCNNはNTKの下でカーネル回帰に対応するか？
RQ3CNTKベースのカーネル性能はCIFAR-10で有限幅のCNNとどれくらい近いか？
RQ4NTK収束を保証するための有限幅の要件は何か、カーネル回帰のような挙動を保証するためには？
RQ5深さとグローバル平均プーリングは画像分類タスクにおけるCNTKの性能に実質的な影響を与えるか？

主な発見

深さ	CNN-V	CNTK-V	CNTK-V-2K	CNN-GAP	CNTK-GAP	CNTK-GAP-2K
3	59.97%	64.47%	40.94%	63.81%	70.47%	49.71%
4	60.20%	65.52%	42.54%	80.93%	75.93%	51.06%
6	64.11%	66.03%	43.43%	83.75%	76.73%	51.73%
11	69.48%	65.90%	43.42%	82.92%	77.43%	51.92%
21	75.57%	64.09%	42.53%	83.30%	77.08%	52.22%

CNTKsはCIFAR-10で11層CNN-GAPを用いて77.43%の精度を達成し、以前のGPベースのカーネルより約10%上回る。
GAPを用いた11層CNTKは、_BATCH正規化とデータ拡張を制御した場合、対応する有限深層ネットの性能のおおよそ5%以内である。
CNTKベースのカーネルは、CIFAR-10において従来の固定カーネルGP法より最大約10%高性能である。
深さとグローバル平均プーリングはCNTKの性能に大きく影響し、GAPはバニラCNNより顕著な性能向上をもたらす。
CNTK（無限幅）と有限CNNの間には依然として5–6%のギャップがあり、有限幅の利点が残ることを示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。