QUICK REVIEW
[論文レビュー] Exploring Generalization in Deep Learning
Behnam Neyshabur, Srinadh Bhojanapalli|arXiv (Cornell University)|Jun 27, 2017
Adversarial Robustness in Machine Learning参考文献 24被引用数 295
ひとこと要約
この論文は深層ニューラルネットワークの一般化を理解・説明するために、複数の提案された複雑さ指標(norms, margins, sharpness, and PAC-Bayes)を評価し、スケール正規化とシャープネスとノルムの相互作用を強調します。
ABSTRACT
With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures explain different observed phenomena.
研究の動機と目的
- Motivate and analyze what drives generalization in deep neural networks beyond training error alone.
- Assess whether proposed complexity measures can guarantee generalization and explain observed phenomena.
- Explore the role of scale, norm, and margin in measuring network capacity and generalization.
- Connect sharpness with PAC-Bayes theory to form a balanced complexity measure.
提案手法
- Review and formalize several complexity measures (norms, margins, sharpness, and PAC-Bayes bounds) for deep networks with ReLU activations.
- Derive capacity bounds based on normed measures such as ||W_i|| and path norms, incorporating a margin term (gamma_margin).
- Analyze Lipschitz/robustness implications and show limitations of using Lipschitz constants alone for capacity control.
- Use PAC-Bayes bounds to relate expected sharpness and KL divergence to generalization guarantees.
- Conduct empirical investigations on networks trained with true vs random labels, varying network size, and multiple optimization settings to test whether measures correlate with generalization.
- Provide bi-criteria plots of sharpness vs. KL divergence to assess joint capacity control.
実験結果
リサーチクエスチョン
- RQ1Do norm-based and margin-based capacity measures sufficiently explain generalization in deep networks?
- RQ2How does sharpness interact with norm and margin within a PAC-Bayes framework to predict generalization?
- RQ3Can these measures distinguish between models trained on true versus random labels, and across different network sizes or optimization schemes?
- RQ4What is the impact of scale (output magnitude) on complexity measures and generalization?
- RQ5Are there empirical phenomena (e.g., more hidden units improving generalization) that these measures can or cannot explain?
主な発見
- Norm-based or path-norm measures combined with a margin can explain differences in generalization between models trained on true vs random labels.
- Pure sharpness is insufficient alone to predict generalization and is scale-dependent; its utility improves when balanced with norm under a PAC-Bayes view.
- Joint PAC-Bayes analysis, combining expected sharpness with KL divergence to a prior, better predicts generalization than either term alone.
- Empirical results show that capacity, as measured by these norms and path norms, does not always increase with simply adding parameters; network optimization bias (implicit regularization) and margin scaling play crucial roles.
- Bi-criteria plots (sharpness vs. KL divergence) reveal that models trained on true labels tend to achieve preferable trade-offs, especially as training set size grows.
- Observations indicate that some measures fail to explain all generalization phenomena (e.g., large networks beyond certain sizes), highlighting the limitations of single-measure explanations.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。