QUICK REVIEW

[論文レビュー] Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting

Hippolyt Ritter, Aleksandar Botev|arXiv (Cornell University)|May 20, 2018

Domain Adaptation and Few-Shot Learning被引用数 98

ひとこと要約

この論文は、Kronecker factored online Laplace approximation を用いて神経ネットワークの壊滅的 forgetting を緩和するオンライン的にガウス事後分布を更新する方法を提案する。ブロック対角の Kronecker-factored Hessian を用いた更新により、50 回の permuted MNIST タスクの系列でテスト精度が 90% を超え、いくつかのベースラインを上回る。

ABSTRACT

We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.

研究の動機と目的

Neural networks での壊滅的 forgetting を緩和するための Bayesian online learning フレームワークを開発する。
投稿間で posterior を追跡する Kronecker factored Laplace approximation を提案・実装する。
モダンなアーキテクチャへスケールさせるためにブロック対角 Hessian 構造を活用する。
curvature を用いた正則化ハイパーパラメータにより memory と plasticity のバランスを探る。

提案手法

近似ガウス事後分布 q(θ|φt) を平均 μt と precision Λt で特徴付ける Bayesian online learning を定式化する。
2 段階の更新を用いる： (i) log p(Dt+1|θ) + log q(θ|φt) を最大化して μt+1 を更新； (ii) Λt+1 = Ht+1(μt+1) + Λt を設定し、新データの log-likelihood のヘシアンを用いる。
Λt が PSD となるよう、Fisher Information に基づく正定値半正定値行列でヘシアンを近似する。
各層の曲率を Hl = Ql ⊗ Hl とするブロック対角の Kronecker-factored ヘシアンを採用し、vec(Wl−Wl*) による効率的な更新を可能にする。
事後分布を層間でマトリクス正規分布として表現し、層内の曲率相互作用を層間ではなく層内で維持する。
Λt+1 = λ Ht+1(μt+1) + Λt のように Hessian に正則化乗数 λ を導入して近似事後の幅を調整する。

実験結果

リサーチクエスチョン

RQ1オンライン Laplace 近似と Kronecker-factored curvature は長いタスク系列にわたる神経ネットワークの忘却を効果的に抑制できるか？
RQ2層内のパラメータ相互作用を含む Kronecker 因子分解は、 continual learning における対角近似より優れているか？
RQ3正則化ハイパーパラメータ λ がオンライン継続学習における memory と plasticity に及ぼす影響は？
RQ4提案手法は EWC および SI と比較して vision および MNIST 派生の continual learning benchmarks へどの程度スケールするか？
RQ5タスクごとに曲率を再計算する必要があるか、それとも保持した曲率で性能を損なうことなく十分か？

主な発見

Kronecker-factored online Laplace は 50 回の permuted MNIST タスク全体で平均テスト精度が 90% 以上を達成し、結合訓練の性能に近づく。
Kronecker-factored 曲率は対角曲率よりも古いタスクを思い出す能力で一貫して優れており、新しいタスクにも適用可能である。
λ の導入は事後の幅を調整するのに役立つ；permuted MNIST では λ ≈ 3 が memory と新たなタスクの学習の間で最適なバランスを示した。
対角近似（EWC 的手法）は Kronecker-factored アプローチより劣っており、層内の重み相互作用の重要性を示唆する。
正則化は Kronecker 因子分解を用いても有益であり、曲率近似を改善することでさらなる利得が期待できる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。