QUICK REVIEW

[論文レビュー] Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks

Kaitong Hu, Zhenjie Ren|arXiv (Cornell University)|May 19, 2019

Markov Chains and Monte Carlo Methods参考文献 58被引用数 30

ひとこと要約

本稿では、確率測度の2次 Wasserstein 空間における連続的時間の勾配流れとして、平均場ランジュヴィンダイナミクス（MFLD）を導入し、プロセスの分布がエネルギー汎関数を最小化する一意な定常分布に指数的速さで収束することを示している。収束は、ラサールの不変性原理とHWI不等式の新規応用により証明されており、対称的または畳み込み型の相互作用ポテンシャルを仮定しない。また、有限次元と無限次元最適化問題の間の誤差バウンドが O(1/N) であることが確立されている。

ABSTRACT

Our work is motivated by a desire to study the theoretical underpinning for the convergence of stochastic gradient type algorithms widely used for non-convex learning tasks such as training of neural networks. The key insight, already observed in the works of Mei, Montanari and Nguyen (2018), Chizat and Bach (2018) as well as Rotskoff and Vanden-Eijnden (2018), is that a certain class of the finite-dimensional non-convex problems becomes convex when lifted to infinite-dimensional space of measures. We leverage this observation and show that the corresponding energy functional defined on the space of probability measures has a unique minimiser which can be characterised by a first-order condition using the notion of linear functional derivative. Next, we study the corresponding gradient flow structure in 2-Wasserstein metric, which we call Mean-Field Langevin Dynamics (MFLD), and show that the flow of marginal laws induced by the gradient flow converges to a stationary distribution, which is exactly the minimiser of the energy functional. We observe that this convergence is exponential under conditions that are satisfied for highly regularised learning tasks. Our proof of convergence to stationary probability measure is novel and it relies on a generalisation of LaSalle's invariance principle combined with HWI inequality. Importantly, we assume neither that interaction potential of MFLD is of convolution type nor that it has any particular symmetric structure. Furthermore, we allow for the general convex objective function, unlike, most papers in the literature that focus on quadratic loss. Finally, we show that the error between finite-dimensional optimisation problem and its infinite-dimensional limit is of order one over the number of parameters.

研究の動機と目的

非凸学習タスクにおける確率的勾配型アルゴリズムの収束を理論的に裏付け、特に深層ニューラルネットワークの学習に適用することを目的とする。
有限次元の非凸問題を無限次元の確率測度空間に持ち上げることで、ニューラルネットワークのエネルギー・ランドスケープを分析すること。
線形汎関数微分を用いてエネルギー汎関数の最小化子の存在と一意性を確立すること。
滑らかさの弱い仮定のもとで、MFLD プロセスがグローバル最小化子に対応する定常分布に指数的速さで収束することを証明すること。
有限次元最適化とその平均場極限との間の近似誤差を、パラメータ数 N に対して O(1/N) として定量すること。

提案手法

ニューラルネットワークの学習における有限次元非凸最適化問題を、確率測度の空間上の無限次元問題に持ち上げる。
確率測度の空間にエネルギー汎関数を定義し、線形汎関数微分を用いた一次条件によりその最小化子を特徴付ける。
平均場ランジュヴィンダイナミクス（MFLD）を2次 Wasserstein 距離における勾配流れとして定義し、系の分布の時間発展をモデル化する。
一般化されたラサールの不変性原理を適用して、周辺分布が定常分布に収束することを証明する。
損失関数およびポテンシャルの正則性仮定のもとで、HWI 不等式を用いて指数的収束レートを確立する。
有限次元最適化とその平均場極限との間の誤差バウンドを O(1/N) として導出する。これは、二次的でない一般の凸目的関数に対しても成立する。

実験結果

リサーチクエスチョン

RQ1過パラメータ化されたニューラルネットワークにおける確率的勾配降下法の収束は、平均場極限を用いて厳密に裏付けられるか？
RQ2確率測度の空間におけるエネルギー汎関数は一意な最小化子を持つのか？また、汎関数微分を用いてその特徴付けは可能か？
RQ3MFLD がグローバル最小化子に指数的速さで収束するための条件は何か？
RQ4有限次元学習と平均場極限との間の近似誤差は、パラメータ数にどのように依存するか？
RQ5相互作用ポテンシャルが対称的または畳み込み型であると仮定しないで、収束証明を確立できるか？

主な発見

確率測度の空間に定義されたエネルギー汎関数は、線形汎関数微分を含む一次条件によって特徴付けられる一意な最小化子を持つ。
平均場ランジュヴィンダイナミクス（MFLD）は、エネルギー汎関数のグローバル最小化子に対応する定常分布に指数的速さで収束する。
収束証明は、ラサールの不変性原理の新規応用とHWI不等式の組み合わせに依拠しており、弱い正則性条件のもとで有効である。
有限次元最適化問題とその無限次元平均場極限との間の誤差は、パラメータ数 N に対して O(1/N) で有界である。
結果は、二次的損失に限定されない一般の凸目的関数に対して成立し、相互作用ポテンシャルが畳み込み型または対称的であると仮定しない。
MFLD の定常分布は、エネルギー汎関数の最小化子に正確に一致し、動的挙動と解の最適性の直接的な関係を確立する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。