QUICK REVIEW

[论文解读] Compressibility and Generalization in Large-Scale Deep Learning.

Wenda Zhou, Victor Veitch|arXiv (Cornell University)|Jan 1, 2018

Machine Learning and Algorithms被引用 12

一句话总结

本文通过基于压缩后网络规模推导泛化界，建立了深度学习中模型压缩与泛化之间的理论联系。该研究为ImageNet等大规模模型提供了首个非平凡的泛化保证，表明可压缩性从根本上受限于泛化误差，过拟合会增加描述模型所需的比特数。

ABSTRACT

Modern neural networks are highly overparameterized, with capacity to substantially overfit to training data. Nevertheless, these networks often generalize well in practice. It has also been observed that trained networks can often be to much smaller representations. The purpose of this paper is to connect these two empirical observations. Our main technical result is a generalization bound for compressed networks based on the compressed size. Combined with off-the-shelf compression algorithms, the bound leads to state of the art generalization guarantees; in particular, we provide the first non-vacuous generalization guarantees for realistic architectures applied to the ImageNet classification problem. As additional evidence connecting compression and generalization, we show that compressibility of models that tend to overfit is limited: We establish an absolute limit on expected compressibility as a function of expected generalization error, where the expectations are over the random choice of training examples. The bounds are complemented by empirical results that show an increase in overfitting implies an increase in the number of bits required to describe a trained network.

研究动机与目标

理论连接过参数化深度神经网络中的模型可压缩性与泛化性能。
推导一个依赖于压缩后模型大小而非原始模型容量的泛化界。
建立可压缩性的绝对极限，作为期望泛化误差的函数。
通过实证验证过拟合与可压缩性之间的反比关系。

提出的方法

基于信息论原理，推导一个依赖于压缩后网络大小的泛化界。
对真实模型应用现成的压缩算法（如剪枝、量化），并测量压缩后的大小。
将压缩后大小用作泛化界中的代理变量，以实现最先进的非平凡保证。
理论分析表明，可压缩性受到期望泛化误差的下限约束，过拟合会增加最小描述长度。
建立泛化误差与描述训练后模型所需比特数之间的信息论关系。

实验结果

研究问题

RQ1能否利用模型压缩推导出大规模深度网络更紧致、非平凡的泛化界？
RQ2是否存在基于模型泛化性能的可压缩性根本限制？
RQ3过拟合程度的增加是否会导致描述模型所需比特数的压缩成本上升？
RQ4在ImageNet等实际设置中，基于压缩的泛化界能否优于传统泛化界？

主要发现

该论文首次通过基于压缩的分析，为真实的ImageNet模型实现了非平凡的泛化界。
可压缩性从根本上受限于泛化误差：泛化误差越高，描述模型所需的比特数下限也越高。
实证结果证实，容易过拟合的模型需要更多比特来描述，支持了过拟合与可压缩性降低之间的理论联系。
基于压缩后大小推导出的泛化界优于先前的边界，在大规模网络中提供了最先进的保证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。