QUICK REVIEW

[論文レビュー] Learning with invariances in random features and kernel models

Mei Song, Theodor Misiakiewicz|arXiv (Cornell University)|Feb 25, 2021

Stochastic Gradient Optimization Techniques参考文献 32被引用数 24

ひとこと要約

本稿では、機械学習モデルにおける不変性の統計的利点を定量化するため、不変なランダム特徴量と不変カーネル手法を導入する。群の退化度 $α \leq 1$ の場合、同じテスト誤差を達成するための必要なサンプルサイズと隠れユニット数が $d^{\alpha}$ 倍減少することを示しており、球面および超立方体上に置かれた並進不変のターゲットを持つ高次元設定において顕著な効率性向上を示している。

ABSTRACT

A number of machine learning tasks entail a high degree of invariance: the data distribution does not change if we act on the data with a certain group of transformations. For instance, labels of images are invariant under translations of the images. Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties. With the objective of quantifying the gain achieved by invariant architectures, we introduce two classes of models: invariant random features and invariant kernel methods. The latter includes, as a special case, the neural tangent kernel for convolutional networks with global average pooling. We consider uniform covariates distributions on the sphere and hypercube and a general invariant target function. We characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as polynomials in the dimension, for a class of groups that we call `degeneracy $α$', with $α\leq 1$. We show that exploiting invariance in the architecture saves a $d^α$ factor ($d$ stands for the dimension) in sample size and number of hidden units to achieve the same test error as for unstructured architectures. Finally, we show that output symmetrization of an unstructured kernel estimator does not give a significant statistical improvement; on the other hand, data augmentation with an unstructured kernel estimator is equivalent to an invariant kernel estimator and enjoys the same improvement in statistical efficiency.

研究の動機と目的

高次元学習問題におけるアーキテクチャの不変性（例：畳み込みネットワーク）の統計的利点を定量化すること。
並進などの群の対称性を尊重する不変なランダム特徴量およびカーネルモデルを形式化・分析すること。
サンプルサイズと隠れユニット数が次元に対して多項式的に増加する高次元スケーリング下での不変モデルのテスト誤差を特徴付けること。
非構造的代替手法（例：出力の対称化、データ拡張）と不変手法を比較すること。
非構造的カーネルを用いたデータ拡張が、不変カーネル推定と統計的に同等であり、同じ効率性向上をもたらすことを確立すること。

提案手法

著者らは、群作用 $\mathcal{G}_d \subset \mathrm{O}(d)$ による特徴量およびカーネルの対称化によって、不変なランダム特徴量およびカーネルモデルを定義し、巡回シフトなどの変換に対して不変性を保証する。
分析は、一様測度を備えた高次元球面 $\mathbb{S}^{d-1}$ および超立方体 $\{-1,1\}^d$ 上で行われ、不変なターゲット関数に焦点を当てる。
不変関数およびカーネルを表現するために、球面上のギーゲンバウアー多項式および超立方体上のハイパーキュービック・ギーゲンバウサー多項式の直交多項式展開が用いられる。
主要な理論的道具は、測度集中および不変関数空間における固有値の減衰に関するハイパーコントラクト性不等式である。
不変カーネルにおける固有値の減衰率を制御する群固有のパrameterである退化度 $\alpha$ を用いて、一般化誤差の境界が導出される。
不変モデルは、非構造的モデルと比較して $d^{\alpha}$ 倍少ないサンプルサイズとパramータ数で同じテスト誤差を達成できることを示している。

実験結果

リサーチクエスチョン

RQ1ランダム特徴量およびカーネルモデルにおいて不変性を強制することで、どの程度のサンプルサイズおよびモデル複雑度の低減が達成可能か？
RQ2特に退化度 $\alpha$ が、不変モデルの統計的効率性向上に果たす役割は何か？
RQ3非構造的カーネル推定器の出力を対称化することで、標準的なカーネル手法よりも一般化性能が向上するか？
RQ4非構造的カーネルを用いたデータ拡張は、統計的性能の観点から不変カーネル推定と同等か？
RQ5球面および超立方体上での不変カーネルのスペクトル特性は、高次元における一般化誤差にどのように影響するか？

主な発見

退化度 $\alpha \leq 1$ の群に対して、不変モデルは非構造的モデルと比較して、同じテスト誤差を達成するためのサンプルサイズと隠れユニット数を $d^{\alpha}$ 倍減少させることができる。
非構造的カーネル推定器を用いたデータ拡張は、不変カーネル推定と統計的に同等であり、同じ $d^{\alpha}$ の効率性向上を達成する。
非構造的カーネル推定器の出力を対称化しても、標準的なカーネル手法に比べて顕著な統計的改善は得られない。
グローバル平均プーリングを備えた畳み込みネットワークのニューラルタングエント・カーネルは、提案された不変カーネル手法の特別な場合である。
ハイパーコンタクト性およびギーゲンバウサー多項式展開を用いて一般化誤差の理論的境界が導出され、不変モデルが高次元スケーリング下でより速く収束することを示している。
退化度 $\alpha$ は、不変カーネルにおけるスペクトルの減衰率を特徴づけ、直接的にサンプル複雑度の低減の程度を決定する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。