QUICK REVIEW

[論文レビュー] Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference

Matthew Riemer, Ignacio Cases|arXiv (Cornell University)|Oct 29, 2018

Domain Adaptation and Few-Shot Learning参考文献 53被引用数 101

ひとこと要約

MER (Meta-Experience Replay) は、経験リプレイと最適化ベースのメタ学習を組み合わせ、継続学習における勾配ベースの転移を最大化し干渉を最小化する。GEM や EWC などのベースラインを、監督学習と強化学習のタスク全体で上回り、特に小さなバッファで顕著である。

ABSTRACT

Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.

研究の動機と目的

非定常データ下での継続学習を動機づけ、時刻的に対称な転移-干渉フレームワークを定義する。
学習ダイナミクスを導くために転移-干渉トレードオフを勾配整列問題として定式化する。
未来の転移と干渉を最適化するために、経験リプレイとメタ学習を統合する Meta-Experience Replay (MER) アルゴリズムを提案する。
監督付き継続学習ベンチマークと非定常な強化学習環境で MER の有効性を示す。

提案手法

転移と干渉を捉えるため、例の対ペア間の勾配内積を用いて前向き転移と後向き転移を定義する。
例間で正の勾配整列を促進する目的関数を定式化する（式4）。
経験リプレイと一階のメタ学習（Reptile）更新を組み合わせて MER を開発し、非定常データでのオンライン最適化を可能にする。
過去データの定常性を近似するメモリ M を維持するためにリザーバサンプリングを用い、訓練中は現在の例とバッファのサンプルを交互に組み合わせる。
計算を実用的に保つため、二次微分項を一次のテイラー展開で近似する。
オンラインの SGD 的更新で MER を効率的に実現するためのバリアントと実装の詳細を記述する。

実験結果

リサーチクエスチョン

RQ1MER は Online、EWC、GEM などのベースラインと比較して MNIST Rotations および MNIST Permutations の監督付き継続学習ベンチマークで保持精度を改善するか？
RQ2メモリバッファのサイズに応じて MER はどう動作するか、特に非常に小さなバッファで？
RQ3MER は Omniglot や Many Permutations を含む標準ベンチマークを超えた、より非定常な lifelong learning 設定に対応できるか？
RQ4DQN と組み合わせた場合、非定常環境での継続的強化学習性能を改善するか？例: Catcher と Flappy Bird.
RQ5トレーニング中の勾配内積分布、すなわち勾配整列への影響はどうか？

主な発見

MER は監督付き継続学習ベンチマーク（MNIST Rotations と MNIST Permutations）で一貫してベースラインを上回り、保持精度が高い。
メモリバッファが小さくなるにつれて MER の利得が大きくなり、エピソディックストレージが限られていると特に強い改善を示す。
MER はより難しい非定常ベンチマーク（Many Permutations と Omniglot）でも強力な結果を達成し、しばしばベースラインをかなり上回る。
継続的強化学習タスク（Catcher と Flappy Bird）では、DQN を用いた MER が通常の DQN with experience replay よりも総合的な性能が良く、忘却が少ない。
分析により、MER は勾配内積分布のシフトを誘発し、転移の改善と干渉の低減を支える勾配整列の強化を示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。