QUICK REVIEW

[论文解读] Stronger generalization bounds for deep nets via a compression approach

Sanjeev Arora, Rong Ge|arXiv (Cornell University)|Feb 14, 2018

Adversarial Robustness in Machine Learning参考文献 28被引用 95

一句话总结

论文通过基于压缩的框架来推导出显著更紧的泛化界限，适用于深度网络，包括卷积网络，通过压缩训练后的网络并分析噪声稳定性属性。

ABSTRACT

Deep nets generalize well despite having more parameters than the number of training samples. Recent works try to give an explanation using PAC-Bayes and Margin-based analyses, but do not as yet result in sample complexity bounds better than naive parameter counting. The current paper shows generalization bounds that're orders of magnitude better in practice. These rely upon new succinct reparametrizations of the trained net --- a compression that is explicit and efficient. These yield generalization bounds via a simple compression-based framework introduced here. Our results also provide some theoretical justification for widespread empirical success in compressing deep nets. Analysis of correctness of our compression relies upon some newly identified extquotedblleft noise stability extquotedblright properties of trained deep nets, which are also experimentally verified. The study of these properties and resulting generalization bounds are also extended to convolutional nets, which had eluded earlier attempts on proving generalization.

研究动机与目标

Motivate why deep nets generalize despite overparameterization.
Propose a simple compression-based framework to bound generalization error.
Identify and empirically validate noise-stability properties that enable compression.
Extend the analysis to convolutional networks and connect theory with practice.

提出的方法

Define (gamma, S)-compressibility and (gamma, S)-compressibility with a helper string to relate compressed models to original performance.
Provide Theorem 2.1 linking compressibility to generalization via a bound on L0(g_A).
Prove a generalization bound (Theorem 2.2) for deep nets using layer-wise compression and stable ranks.
Introduce noise-stability concepts (layer cushion, interlayer cushion, activation contraction, interlayer smoothness) to justify stronger compression.
Propose Algorithm 1 (Matrix-Project) to compress layers and bound output perturbations, leading to smaller effective parameter counts.
Extend the compression framework to convolutional nets, incorporating p-wise independence for shared filters.

实验结果

研究问题

RQ1Can a trained deep net be compressed into a simpler model with similar training performance, enabling better generalization bounds?
RQ2Do noise-stability properties of layers permit aggressive compression without large generalization penalties?
RQ3Can the compression framework be extended to convolutional architectures while preserving provable guarantees?
RQ4Do the proposed bounds align with empirical generalization behavior on real networks?

主要发现

A compression-based framework yields generalization bounds that are tighter than naive parameter counting.
Layer-wise compression error can be controlled so overall output perturbation remains small under certain cushions and smoothness properties.
For fully connected nets, bounds depend on layer cushions, interlayer cushions, activation contraction, and interlayer smoothness, plus the stable rank of layers.
The approach extends to convolutional nets using p-wise independent filter compression and generalized interlayer cushions.
Empirical evaluation on VGG-19 and AlexNet shows the proposed stability properties and compression-based bounds plausibly correlate with generalization on CIFAR-10.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。