QUICK REVIEW

[論文レビュー] Measuring the Intrinsic Dimension of Objective Landscapes

Chunyuan Li, Heerad Farkhoor|arXiv (Cornell University)|Apr 24, 2018

Machine Learning and Data Classification参考文献 19被引用数 63

ひとこと要約

この論文はランダムサブスペース訓練を導入してニューラルネットワーク最適化の固有次元を測定し、多くの問題は総パラメータ数よりはるかに少ない活性自由度で済むことを明らかにし、圧縮 MDL-guided モデリングの視点を可能にする。

ABSTRACT

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

研究の動機と目的

パラメータ空間における解集合の余次元として固有次元を定義する。
ランダムサブスペース最適化を用いて固有次元を推定する実用的な方法を開発する。
アーキテクチャ、データセット、学習パラダイム間で固有次元を比較し、目的関数のランドスケープをマッピングする。
モデル圧縮およびMDLベースのモデル選択への影響を探る。

提案手法

全パラメータ空間の d 次元部分空間を定義するためにランダム射影 P を導入する。
theta^(D)_0 と P を固定したまま、サブスペース座標 theta^(d) のみを訓練する。
解が存在する最小の部分空間を特定するために d を増加させ、(閾値以上の性能) を持つ解が得られるときの d_int90 を求める。
性能閾値 (例: ベースラインの 90%) を用いて解を分類し、ブートストラップによる頑健性検証を行う。
FC, LeNet, CNN、および RL タスク間で固有次元を比較し、射影法（密、疎、Fastfood）を分析する。
d_int90 を最小記述長 (MDL) に関連付け、圧縮への含意を論じる。）

実験結果

リサーチクエスチョン

RQ1ランダムに向きづけられた部分空間内で最適化した場合、さまざまなニューラルネットワーク問題の固有次元はどうなるか。
RQ2d_int90 はアーキテクチャ、データセット、強化学習タスク間でどのようにスケールするか。
RQ3大規模なモデルはより大きな冗長性を示すか、これがMDLベースのモデル選択にどう影響するか。
RQ4ランダムサブスペース訓練は顕著な性能低下なしに実用的なネットワーク圧縮をもたらすか。
RQ5監視付きタスクと RL 環境で固有次元はどのように異なるか。

主な発見

固有次元 d_int90 はしばしば全パラメータ数 D よりもはるかに小さい (例: MNIST FC: D=199k, d_int90≈750; LeNet: D=44k, d_int90≈290)。
モデルサイズを大きくすると冗長性 s が増加し、D の広範な範囲で d_int90 はほとんど変わらず、追加パラメータは解空間を拡張するにとどまり、解きやすさを改善しないことを示唆する。
畳み込みネットワークは MNIST および CIFAR-10 で FC ネットよりパラメータ効率が良く、ランダムサブスペース訓練は大幅な圧縮をもたらす (例: MNIST FC 圧縮 ~260倍; LeNet ~150倍)。
RLタスクではタスクごとに固有次元が異なる（例: Inverted Pendulum: d_int90≈4; Humanoid: d_int90≈700; Pong: d_int90≈6000）、監視付きタスクと同程度の難易度のばらつきを示す。
固有次元は解に対するMDLの上限を提供し、訓練手続きを変更せずに実用的なエンドツーエンドの圧縮戦略を提供する。）

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。