QUICK REVIEW

[論文レビュー] Exploring Generalization in Deep Learning

Behnam Neyshabur, Srinadh Bhojanapalli|arXiv (Cornell University)|Jun 27, 2017

Adversarial Robustness in Machine Learning参考文献 24被引用数 295

ひとこと要約

この論文は深層ニューラルネットワークの一般化を理解・説明するために、複数の提案された複雑さ指標（norms, margins, sharpness, and PAC-Bayes）を評価し、スケール正規化とシャープネスとノルムの相互作用を強調します。

ABSTRACT

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures explain different observed phenomena.

研究の動機と目的

Motivate and analyze what drives generalization in deep neural networks beyond training error alone.
Assess whether proposed complexity measures can guarantee generalization and explain observed phenomena.
Explore the role of scale, norm, and margin in measuring network capacity and generalization.
Connect sharpness with PAC-Bayes theory to form a balanced complexity measure.

提案手法

Review and formalize several complexity measures (norms, margins, sharpness, and PAC-Bayes bounds) for deep networks with ReLU activations.
Derive capacity bounds based on normed measures such as ||W_i|| and path norms, incorporating a margin term (gamma_margin).
Analyze Lipschitz/robustness implications and show limitations of using Lipschitz constants alone for capacity control.
Use PAC-Bayes bounds to relate expected sharpness and KL divergence to generalization guarantees.
Conduct empirical investigations on networks trained with true vs random labels, varying network size, and multiple optimization settings to test whether measures correlate with generalization.
Provide bi-criteria plots of sharpness vs. KL divergence to assess joint capacity control.

実験結果

リサーチクエスチョン

RQ1Do norm-based and margin-based capacity measures sufficiently explain generalization in deep networks?
RQ2How does sharpness interact with norm and margin within a PAC-Bayes framework to predict generalization?
RQ3Can these measures distinguish between models trained on true versus random labels, and across different network sizes or optimization schemes?
RQ4What is the impact of scale (output magnitude) on complexity measures and generalization?
RQ5Are there empirical phenomena (e.g., more hidden units improving generalization) that these measures can or cannot explain?

主な発見

Norm-based or path-norm measures combined with a margin can explain differences in generalization between models trained on true vs random labels.
Pure sharpness is insufficient alone to predict generalization and is scale-dependent; its utility improves when balanced with norm under a PAC-Bayes view.
Joint PAC-Bayes analysis, combining expected sharpness with KL divergence to a prior, better predicts generalization than either term alone.
Empirical results show that capacity, as measured by these norms and path norms, does not always increase with simply adding parameters; network optimization bias (implicit regularization) and margin scaling play crucial roles.
Bi-criteria plots (sharpness vs. KL divergence) reveal that models trained on true labels tend to achieve preferable trade-offs, especially as training set size grows.
Observations indicate that some measures fail to explain all generalization phenomena (e.g., large networks beyond certain sizes), highlighting the limitations of single-measure explanations.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。