QUICK REVIEW

[论文解读] Generalization in Deep Learning

Kenji Kawaguchi, Leslie Pack Kaelbling|arXiv (Cornell University)|Oct 16, 2017

Computability, Logic, AI Algorithms被引用 83

一句话总结

提供理论洞见，解释深度学习在容量巨大下仍能泛化的原因，并给出非空泛的泛化保证和待解决的开放问题。

ABSTRACT

This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature. We also discuss approaches to provide non-vacuous generalization guarantees for deep learning. Based on theoretical observations, we propose new open problems and discuss the limitations of our results.

研究动机与目标

Explain why deep learning generalizes despite over-parameterization and potential instability.
Bridge empirical observations (e.g., memorization of random labels) with theoretical guarantees.
Develop generalization bounds applicable to deep networks, including validation-based guarantees.
Analyze generalization errors in specific neural network settings (ReLU, max-pooling, DAGs).
Propose open problems that distinguish natural data from adversarial or random-label scenarios.

提出的方法

Review and synthesize existing generalization theories (capacity, stability, robustness, flat minima) and their limitations for deep learning.
Introduce a formal analysis framework for deep networks as DAGs with ReLU and max pooling to derive generalization insights.
Present Theorem 7, which analyzes the generalization gap for neural networks with squared loss based on the learned weights and the pair (P(X,Y), S).
Derive results for layered networks without skip connections and for DAGs by expressing outputs as sums over paths.
Propose practical roles for generalization theory, including validation-based guarantees (Proposition 5) and insights for model class selection.

实验结果

研究问题

RQ1What governs the generalization gap for over-parameterized deep networks on a fixed dataset (P(X,Y), S)?
RQ2Can generalization be tightly characterized for neural networks using only the learned weights and the data distribution pair (P(X,Y), S)?
RQ3How do validation datasets and practical model-search procedures influence non-vacuous generalization guarantees?
RQ4Do conventional complexity-based explanations (capacity, stability, flat minima) fully explain generalization in practice, or are there instance-specific effects?
RQ5What open problems distinguish natural data from random-label scenarios in the context of deep learning generalization?

主要发现

Over-parameterized linear models can memorize arbitrary training data and achieve near-zero training and test error under certain rank conditions.
Conventional norms or flat minima alone do not fully explain generalization, even in linear or simple settings.
Validation-based generalization guarantees can be non-vacuous and practically meaningful if the validation set is appropriately used (Proposition 5).
Generalization gaps can be analyzed directly for neural networks via the learned weights and the data pair (P(X,Y), S) without relying solely on capacity-based bounds (Theorem 7).
The paper clarifies the consistency of theory with empirical observations and highlights open problems that tie practical performance to theoretical guarantees.
Different problem settings (specifically the distinction between pointwise analysis and worst-case distributions) can reconcile apparent paradoxes in generalization theory.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。