QUICK REVIEW

[论文解读] Measuring the Intrinsic Dimension of Objective Landscapes

Chunyuan Li, Heerad Farkhoor|arXiv (Cornell University)|Apr 24, 2018

Machine Learning and Data Classification参考文献 19被引用 63

一句话总结

本论文提出随机子空间训练，以测量神经网络目标景的固有维度，揭示许多问题所需的有效自由度远少于总参数，并实现了一个压缩的基于MDL的建模视角。

ABSTRACT

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

研究动机与目标

将固有维度定义为参数空间中解集的余维数。
开发一种使用随机子空间优化来估计固有维度的实用方法。
在不同架构、数据集和学习范式之间比较固有维度，以绘制目标景的映射。
研究对模型压缩和基于MDL的模型选择的影响。

提出的方法

引入一个随机投影 P，以定义完整参数空间的一个 d 维子空间。
仅训练子空间坐标 theta^(d)，同时保持 theta^(D)_0 和 P 固定。
将 d 递增，以识别存在解（性能高于阈值）的最小子空间（d_int90）。
使用性能阈值（例如基线的 90%）对解进行分类，并进行自举法的鲁棒性检验。
在 FC、LeNet、CNN 和 RL 任务之间比较固有维度；分析投影方法（密集、稀疏、Fastfood）。
将 d_int90 与最小描述长度（MDL）相关联，并讨论对压缩的影响。

实验结果

研究问题

RQ1在随机定向子空间内优化时，各种神经网络问题的固有维度是多少？
RQ2d_int90 如何在架构、数据集和强化学习任务之间扩展/缩放？
RQ3更大的模型是否表现出更高的冗余，这将如何影响基于 MDL 的模型选择？
RQ4随机子空间训练是否能在不显著损失性能的情况下实现实用的网络压缩？
RQ5有监督任务和强化学习环境之间，固有维度有何差异？

主要发现

固有维度 d_int90 通常远小于完整参数数量 D（例如 MNIST 的 FC：D=199k，d_int90≈750；LeNet：D=44k，d_int90≈290）。
增大模型规模会增加冗余 s，d_int90 在广泛的 D 区间内几乎不变，表明额外参数扩展了解的流形，而非提高可解性。
卷积网络在 MNIST 和 CIFAR-10 上比全连接网络更具参数效率，随机子空间训练提供了显著的压缩（例如 MNIST FC 压缩约 260 倍；LeNet 約 150 倍）。
对于 RL 任务，固有维度随任务而异（例如倒立摆：d_int90≈4；Humanoid：d_int90≈700；Pong：d_int90≈6000），表明难度水平与有监督任务相近但各异。
固有维度为解提供 MDL 的上界，并提供一种不改变训练过程的实用端到端压缩策略。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。