QUICK REVIEW

[論文レビュー] Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Anusha Nagabandi, Chelsea Finn|arXiv (Cornell University)|Dec 18, 2018

Domain Adaptation and Few-Shot Learning参考文献 33被引用数 48

ひとこと要約

MOLeは、オンラインSGDとメタ学習済みネットワークの混合を中国レストラン過程を用いて組み合わせ、モデルベースの強化学習における非定常タスクへ継続的に適応する。メタ学習は迅速なオンライン適応とタスクの想起の有利な事前分布を提供する。

ABSTRACT

Humans and animals can learn complex predictive models that allow them to accurately and reliably reason about real-world phenomena, and they can adapt such models extremely quickly in the face of unexpected changes. Deep neural network models allow us to represent very complex functions, but lack this capacity for rapid online adaptation. The goal in this paper is to develop a method for continual online learning from an incoming stream of data, using deep neural network models. We formulate an online learning procedure that uses stochastic gradient descent to update model parameters, and an expectation maximization algorithm with a Chinese restaurant process prior to develop and maintain a mixture of models to handle non-stationary task distributions. This allows for all models to be adapted as necessary, with new models instantiated for task changes and old models recalled when previously seen tasks are encountered again. Furthermore, we observe that meta-learning can be used to meta-train a model such that this direct online adaptation with SGD is effective, which is otherwise not the case for large function approximators. In this work, we apply our meta-learning for online learning (MOLe) approach to model-based reinforcement learning, where adapting the predictive model is critical for control; we demonstrate that MOLe outperforms alternative prior methods, and enables effective continuous adaptation in non-stationary task distributions such as varying terrains, motor failures, and unexpected disturbances.

研究の動機と目的

非定常環境における深層モデルの迅速かつ継続的なオンライン適応を動機づける。
タスク固有モデルの混合を維持・更新するオンラインEMベースの手法を開発する。
少数の勾配ステップで効果的なオンライン適応を可能にするため、メタ学習済みの事前分布を活用する。
CRP前提により未知のタスク境界を処理し、新しいタスクをインスタンス化する。

提案手法

オンライン学習を、各タスクTについてθ(T)をSGDで更新することとして定式化し、P(T_t | x_t, y_t)に導かれる。
EMを用いてタスク責任度を推定し、オンラインでモデルパラメータを更新する。
中国レストラン過程を用いてタスク分布をモデリングし、必要に応じて新しいタスクをインスタンス化する。
MAMLで事前分布θ*をメタ訓練し、少量データからの迅速な適応を可能にする。
オンラインのMステップでは、P_t(T_t=T_i|x_t,y_t)でスケーリングされた1つの勾配ステップを用いてθ_{t+1}(T_i)を更新する。
過去K回の遷移から次状態を予測し、更新されたモデルを用いるコントローラで計画することにより、MOLeをモデルベースのRLに適用する。

実験結果

リサーチクエスチョン

RQ1MOLeは非定常データのストリームにおいて自律的にタスク構造を発見できるか？
RQ2MOLeはkショット法で許容される範囲を超えたタスクにも適応できるか？
RQ3MOLeはオンラインで以前に見たタスクを認識して復元できるか？
RQ4MOLeは最近のタスクへの過剰適合を避け、タスク切替時に過去の技能を維持できるか？
RQ5非定常設定において、MOLeは他のオンライン学習およびメタ学習のベースラインより性能が高いか？

主な発見

MOLeはモデルベースのRLにおいて新しいタスクのオンライン実体化と分布外タスクへの適応を実現する。
MOLeはオンラインで以前に見たタスクを想起し、それらに復帰できる。
MOLeは連続的な勾配更新を用いるかメタ学習を行わないベースライン、またはkショットメタRLを上回る。
MAMLを介したメタ訓練済み事前分布は、非定常なマルチタスク設定における継続的なオンライン適応の効果的な初期化を提供する。
CRP前提を用いると、タスクへのソフトアサインと明示的なタスク区分なしに専門化が自然に出現する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。