QUICK REVIEW

[論文レビュー] Meta-Learning with Warped Gradient Descent

Sebastian Flennerhag, Andrei A. Rusu|arXiv (Cornell University)|Aug 30, 2019

Domain Adaptation and Few-Shot Learning参考文献 64被引用数 66

ひとこと要約

WarpGradは warp-layersを用いて勾配を事前条件付けし、few-shot、standard、continual、および reinforcement learningタスク全般にわたる、スケーラブルで軌道非依存の勾配ベースのメタ学習を可能にする。

ABSTRACT

Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network that directly produces updates or by attempting to learn better initialisations or scaling factors for a gradient-based update rule. Both of these approaches pose challenges. On one hand, directly producing an update forgoes a useful inductive bias and can easily lead to non-converging behaviour. On the other hand, approaches that try to control a gradient-based update rule typically resort to computing gradients through the learning process to obtain their meta-gradients, leading to methods that can not scale beyond few-shot task adaptation. In this work, we propose Warped Gradient Descent (WarpGrad), a method that intersects these approaches to mitigate their limitations. WarpGrad meta-learns an efficiently parameterised preconditioning matrix that facilitates gradient descent across the task distribution. Preconditioning arises by interleaving non-linear layers, referred to as warp-layers, between the layers of a task-learner. Warp-layers are meta-learned without backpropagating through the task training process in a manner similar to methods that learn to directly produce updates. WarpGrad is computationally efficient, easy to implement, and can scale to arbitrarily large meta-learning problems. We provide a geometrical interpretation of the approach and evaluate its effectiveness in a variety of settings, including few-shot, standard supervised, continual and reinforcement learning.

研究の動機と目的

既存の勾配ベースのメタ学習器の収束性、スケーラビリティ、クレジット割り当てといった制約を動機づけて対処する。
勾配を事前条件付けするために、タスク-learner層の間に warp-layers を埋め込み、軌道非依存の事前条件付けフレームワークを提案する。
リーマン幾何学的指標を介した WarpGrad の幾何学的解釈を提供し、few-shot、multi-shot、continual、 reinforcement learning の設定においてスケーラブルな性能を示す。

提案手法

warp-layers を task-learner 層と交互に配置して warp-network を形成し、データ依存の勾配事前条件付けを可能にする。
P(θ;φ) は warp-layers とそのヤコビ行列を用いて実現される。一般的な事前条件付けルール U(θ;φ)=θ−αP(θ;φ)∇L(θ) を定義する。ここで P は warp-layers とそのヤコビ行列を用いて実現される。
タスクと中間の task-learner パラメータの結合分布を最適化することにより、軌道に依存しないメタ目的関数 L(φ) を導出し、完全な適応軌道を通じたバックプロパゲーションを回避する。
幾何学を説明する：warp-layers はワーピング空間にメトリック G を誘導し、G−1 が事前条件付け器として機能する；ワーピング空間での更新とリーマン幾何学的計量の下での下降との一階の同値性を確立する。
オンライン（Algorithm 1）およびオフライン（Algorithm 2）メタトレーニング手順を提案し、warp-parameters φ を学習すると同時に、初期タスクパラメータ θ0|τ の事前分布を学習または使用する。
学習済み初期化と事前分布との統合を示し、さまざまなトレーニングレジーム（オンライン/オフライン、監督付き/RL 継続学習）を可能にする。
ブロック対角構造を超えるよりリッチな事前条件付けを捉える非線形 warp-layers を実演し、 RL タスクでメモリを活用する挙動を示す。

実験結果

リサーチクエスチョン

RQ1WarpGrad は適応軌道を通じたバックプロパゲーションを回避しつつ、勾配ベースの few-shot 学習者の帰納的バイアスを保持できるか。
RQ2WarpGrad は few-shot 学習を超えて multi-shot および標準の教師あり/ RL タスクへどの程度スケールできるか。
RQ3継続学習やメモリを要するタスクなど、複雑なメタ学習シナリオへ WarpGrad は一般化できるか。
RQ4学習された warp-幾何学は、収束保証を促進する曲率ベースの事前条件付けとして解釈可能か。

主な発見

WarpGrad は標準的な few-shot ベンチマーク（mini-ImageNet および tiered-ImageNet）でベースラインの勾配ベースのメタ学習器を上回る。
Warp-MAML および Warp-Leap の変種は、few-shot および multi-shot 設定で、Omniglot や tiered-ImageNet を含む拡張適応ステップを持つ場合、非 warp 版より高い精度を達成する。
非線形 warp-layers はブロック対角構造を超えるよりリッチな事前条件付けを可能にし、継続学習や RL の迷路ナビゲーションのような複雑なタスクの性能を向上させる。
warp-learned パラメータを用いたオフラインメタトレーニングは substantial gains を生む（例：Omniglot の結果が 76.3% から 84.3% のテスト精度へ改善）。
WarpGrad は、暗黙のリーマン幾何学的計量を介して勾配降下に似た更新として事前条件付けを埋め込むことで収束性を維持し、タスク間の安定性とスケーラビリティを提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。