QUICK REVIEW

[論文レビュー] Few-Shot Learning via Learning the Representation, Provably

Simon S. Du, Wei Hu|arXiv (Cornell University)|Feb 21, 2020

Domain Adaptation and Few-Shot Learning参考文献 33被引用数 56

ひとこと要約

少数ショット学習を表現学習を通じて分析し、複数のソースタスクからデータをプールすることでターゲットタスク学習を改善する証明可能なサンプル複雑性保証を導出し、非線形・ニューラルを含む低次元および高次元表現についての明示的な速度を示す。

ABSTRACT

This paper studies few-shot learning via representation learning, where one uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task for which there is only $n_2 (\ll n_1)$ data. Specifically, we focus on the setting where there exists a good \emph{common representation} between source and target, and our goal is to understand how much of a sample size reduction is possible. First, we study the setting where this common representation is low-dimensional and provide a fast rate of $O\left(\frac{\mathcal{C}\left(Φ ight)}{n_1T} + \frac{k}{n_2} ight)$; here, $Φ$ is the representation function class, $\mathcal{C}\left(Φ ight)$ is its complexity measure, and $k$ is the dimension of the representation. When specialized to linear representation functions, this rate becomes $O\left(\frac{dk}{n_1T} + \frac{k}{n_2} ight)$ where $d (\gg k)$ is the ambient input dimension, which is a substantial improvement over the rate without using representation learning, i.e. over the rate of $O\left(\frac{d}{n_2} ight)$. This result bypasses the $Ω(\frac{1}{T})$ barrier under the i.i.d. task assumption, and can capture the desired property that all $n_1T$ samples from source tasks can be \emph{pooled} together for representation learning. Next, we consider the setting where the common representation may be high-dimensional but is capacity-constrained (say in norm); here, we again demonstrate the advantage of representation learning in both high-dimensional linear regression and neural network learning. Our results demonstrate representation learning can fully utilize all $n_1T$ samples from source tasks.

研究の動機と目的

ターゲットタスクのサンプル複雑性を低減するための少数ショット学習における表現学習の動機づけ。
ソースとターゲット間の共通表現がどのように一般化境界を改善するかを特徴づける。
ソースデータをターゲットタスクの支援にいかにして完全に活用できるかを示す理論的速度を提供する。
線形から非線形・高次元設定へ拡張し、ニューラルネットワークを含む結果を含む。

提案手法

ソースタスクデータを用いて共有表現とタスク固有の予測子を学習する結合最適化を定式化する: phi ∈ Φ, W の最小化: (1/2n1T) sum_t ||y_t - X_t φ(X_t) w_t||^2.
学習した表現 hat{φ} がターゲットタスクの線形予測子とともに用いられることを示す: min_w 1/(2n2) ||y_{T+1} - hat{φ}(X_{T+1}) w||^2.
ターゲットタスクのリスク境界を、ソース平均表現誤差とターゲット固有の推定誤差を分離して導出する: ER <= ~O(C(Φ)/(n1 T) + k/n2)（低次元線形設定）および非線形 Φ の一般化形。
共分散支配と多様性仮定を用いた高次元線形表現へ拡張し、Σの特異値関連量とトレース項を含む速度を導く。
類似の条件下で二層のReLUネットワークに対して同じプーリング効果を示すニューラルネットワーク対応の拡張を提供する。

実験結果

リサーチクエスチョン

RQ1ソースタスクとターゲットタスクの間に共通表現が存在する場合、少数ショット学習でどの程度のサンプル複雑性の低減が実現可能か。
RQ2表現のサイズと構造（低次元 vs 高次元、線形 vs 非線形、ニューラルネットワーク）が、ソースタスクデータを活用した場合のターゲットタスクリスクにどう影響するか。
RQ3どの分布仮定と多様性仮定の下で、n1T 個のすべてのソースサンプルをプールしてターゲット性能を改善できるか。
RQ4理論的利得は線形表現から非線形および過parameterizationされたニューラルネットワークへ拡張されるか。

主な発見

低次元線形表現では、ターゲット過剰リスクは ~O(dk/(n1 T) + k/n2) とスケーリングし、従来の d/n2 率より著しく改善する。
非線形表現へ一般化すると、境界は ~O(C(Φ)/(n1 T) + k/n2) となり、依然としてすべてのソースデータのプーリングを可能にする。
共分散構造を伴う高次元線形表現では、速度は ~O( (R̄ sqrt{Tr(Σ)})/sqrt{n1 T} + (R̄ sqrt{||Σ||_2})/sqrt{n2} ) に改善する。
結果は、ソースタスクの全てのn1Tサンプルを表現学習に完全に活用でき、i.i.d.タスク仮定の下で1/sqrt(T)の障壁を回避できることを示す。
この枠組みはReLU活性化を持つ二層ニューラルネットワークへ拡張され、表現学習からの同じ質的利得を維持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。