QUICK REVIEW

[論文レビュー] Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Yiping Lu, Aoxiao Zhong|arXiv (Cornell University)|Oct 27, 2017

Model Reduction and Neural Networks参考文献 48被引用数 153

ひとこと要約

この論文は人気の深層ネットワークアーキテクチャをODEの数値離散化に結び付け、ResNet/ResNeXtに適用された線形多段LMアーキテクチャを導入し、パラメータ圧縮の可能性を含む性能向上を示す。また確率的トレーニングを確率動的システムとして解釈する。

ABSTRACT

In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CIFAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress ($>50$\%) the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.

研究の動機と目的

深層ネットの設計原理を、アーキテクチャを微分方程式の離散化と結びつけることで動機づける。
LM-アーキテクチャを導入し、ResNet/ResNeXt に適用して、パラメータを減らしつつ精度を向上させる。
数値解析からの修正方程式を用いて性能向上と安定性を説明する。
確率的動作としての確率的トレーニングを探究し、一般化を高める。

提案手法

ResNet/ResNeXt および関連ネットワークを u_t = f(u) を解く数値スキーム（前進オイラー、後退オイラー、ルンゲ＝クッタ）に対応づける。
LM-アーキテクチャを提案: u_{n+1} = (1 - k_n) u_n + k_n u_{n-1} + f(u_n) を訓練可能な k_n と共に。
LM-アーキテクチャを ResNet/ResNeXt に適用して LM-ResNet/LM-ResNeXt を形成し、CIFAR と ImageNet で評価する。
改良方程式を分析して、性能向上と安定性を説明する。
確率的学習戦略（ノイズ注入）を、確率的ダイナミクスの近似として説明し、LM-アーキテクチャ（確率的深さ）へ拡張する。

実験結果

リサーチクエスチョン

RQ1深層ネットワークアーキテクチャは微分方程式の離散化として解釈できるか、そしてこの解釈がアーキテクチャ設計を導くか。
RQ2LM-アーキテクチャは CIFAR および ImageNet で ResNet/ResNeXt の性能と/またはパラメータ効率を向上させるか。
RQ3修正方程式は LM-アーキテクチャから観測される利得をどう説明するか。
RQ4確率的トレーニング戦略は確率的ダイナミックシステムとして理解でき、LM-アーキテクチャに利益をもたらすか。

主な発見

Model	Layer	Error	Params	Dataset
ResNet (He et al. 2015b)	20	8.75	0.27M	CIFAR10
ResNet (He et al. 2015b)	32	7.51	0.46M	CIFAR10
ResNet (He et al. 2015b)	44	7.17	0.66M	CIFAR10
ResNet (He et al. 2015b)	56	6.97	0.85M	CIFAR10
ResNet (He et al. 2016)	110, pre-act	6.37	1.7M	CIFAR10
LM-ResNet (Ours)	20, pre-act	8.33	0.27M	CIFAR10
LM-ResNet (Ours)	32, pre-act	7.18	0.46M	CIFAR10
LM-ResNet (Ours)	44, pre-act	6.66	0.66M	CIFAR10
LM-ResNet (Ours)	56, pre-act	6.31	0.85M	CIFAR10
ResNet (He et al. 2015b)	110, pre-act	27.76	1.7M	CIFAR100
ResNet (He et al. 2015b)	164, pre-act	24.33	2.55M	CIFAR100
ResNet (He et al. 2015b)	1001, pre-act	22.71	18.88M	CIFAR100
FractalNet (Larsson et al. 2016)	20	23.30	38.6M	CIFAR100
FractalNet (Larsson et al. 2016)	40	22.49	22.9M	CIFAR100
DenseNet (Huang et al. 2016a)	100	19.25	27.2M	CIFAR100
DenseNet-BC (Huang et al. 2016a)	190	17.18	25.6M	CIFAR100
ResNeXt (Xie et al. 2017)	29(8×64d)	17.77	34.4M	CIFAR100
ResNeXt (Xie et al. 2017)	29(16×64d)	17.31	68.1M	CIFAR100
ResNeXt (Our Implement)	29(16×64d), pre-act	17.65	68.1M	CIFAR100
LM-ResNet (Ours)	110, pre-act	25.87	1.7M	CIFAR100
LM-ResNet (Ours)	164, pre-act	22.90	2.55M	CIFAR100
LM-ResNeXt (Ours)	29(8×64d), pre-act	17.49	35.1M	CIFAR100
LM-ResNeXt (Ours)	29(16×64d), pre-act	16.79	68.8M	CIFAR100

LM-ResNet/LM-ResNeXt は CIFAR および ImageNet で、同等のパラメータ数の ResNet/ResNeXt より高い精度を達成する。
CIFAR 上で、LM-ResNet/LM-ResNeXt は元のネットワークを大幅に圧縮できる（>50%）が、性能をほぼ維持する。
CIFAR-10/ CIFAR-100 で、LM-ResNet/LM-ResNeXt は、深さを変えてもベースラインより顕著な改善を示す。
ImageNet では、LM-ResNet50/ResNet50 と LM-ResNet101/ResNet101 が、パラメータ予算と同等で top-1/top-5 の精度を改善する（例: LM-ResNet50 top-1 23.8 vs ResNet50 24.7; top-1 22.6 vs 23.6 for LM-ResNet101 vs ResNet101）。
確率的深さや他のノイズ注入は性能をさらに向上させる可能性があり、確率的動的システムの解釈を通じて LM-アーキテクチャに自然に組み込むことができる。
修正方程式の解析は、LM-構造係数（k_n）が学習されたダイナミクスの加速と安定性にどのように影響するかを説明する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。