Skip to main content
QUICK REVIEW

[論文レビュー] Exploring Generalization in Deep Learning

Behnam Neyshabur, Srinadh Bhojanapalli|arXiv (Cornell University)|Jun 27, 2017
Adversarial Robustness in Machine Learning参考文献 24被引用数 295
ひとこと要約

この論文は深層ニューラルネットワークの一般化を理解・説明するために、複数の提案された複雑さ指標(norms, margins, sharpness, and PAC-Bayes)を評価し、スケール正規化とシャープネスとノルムの相互作用を強調します。

ABSTRACT

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures explain different observed phenomena.

研究の動機と目的

  • Motivate and analyze what drives generalization in deep neural networks beyond training error alone.
  • Assess whether proposed complexity measures can guarantee generalization and explain observed phenomena.
  • Explore the role of scale, norm, and margin in measuring network capacity and generalization.
  • Connect sharpness with PAC-Bayes theory to form a balanced complexity measure.

提案手法

  • Review and formalize several complexity measures (norms, margins, sharpness, and PAC-Bayes bounds) for deep networks with ReLU activations.
  • Derive capacity bounds based on normed measures such as ||W_i|| and path norms, incorporating a margin term (gamma_margin).
  • Analyze Lipschitz/robustness implications and show limitations of using Lipschitz constants alone for capacity control.
  • Use PAC-Bayes bounds to relate expected sharpness and KL divergence to generalization guarantees.
  • Conduct empirical investigations on networks trained with true vs random labels, varying network size, and multiple optimization settings to test whether measures correlate with generalization.
  • Provide bi-criteria plots of sharpness vs. KL divergence to assess joint capacity control.

実験結果

リサーチクエスチョン

  • RQ1Do norm-based and margin-based capacity measures sufficiently explain generalization in deep networks?
  • RQ2How does sharpness interact with norm and margin within a PAC-Bayes framework to predict generalization?
  • RQ3Can these measures distinguish between models trained on true versus random labels, and across different network sizes or optimization schemes?
  • RQ4What is the impact of scale (output magnitude) on complexity measures and generalization?
  • RQ5Are there empirical phenomena (e.g., more hidden units improving generalization) that these measures can or cannot explain?

主な発見

  • Norm-based or path-norm measures combined with a margin can explain differences in generalization between models trained on true vs random labels.
  • Pure sharpness is insufficient alone to predict generalization and is scale-dependent; its utility improves when balanced with norm under a PAC-Bayes view.
  • Joint PAC-Bayes analysis, combining expected sharpness with KL divergence to a prior, better predicts generalization than either term alone.
  • Empirical results show that capacity, as measured by these norms and path norms, does not always increase with simply adding parameters; network optimization bias (implicit regularization) and margin scaling play crucial roles.
  • Bi-criteria plots (sharpness vs. KL divergence) reveal that models trained on true labels tend to achieve preferable trade-offs, especially as training set size grows.
  • Observations indicate that some measures fail to explain all generalization phenomena (e.g., large networks beyond certain sizes), highlighting the limitations of single-measure explanations.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。