QUICK REVIEW

[论文解读] A Unifying Bayesian View of Continual Learning

Sebastian Farquhar, Yarin Gal|arXiv (Cornell University)|Feb 18, 2019

Domain Adaptation and Few-Shot Learning参考文献 16被引用 48

一句话总结

本文提出一个统一的贝叶斯框架用于持续学习，结合 prior-focused 与 likelihood-focused 方法，引入 Variational Generative Replay (VGR)，并展示 likelihood-focused 方法在某些情况下可以超越 prior-focused 方法，同时提供更好的不确定性校准。

ABSTRACT

Some machine learning applications require continual learning - where data comes in a sequence of datasets, each is used for training and then permanently discarded. From a Bayesian perspective, continual learning seems straightforward: Given the model posterior one would simply use this as the prior for the next task. However, exact posterior evaluation is intractable with many models, especially with Bayesian neural networks (BNNs). Instead, posterior approximations are often sought. Unfortunately, when posterior approximations are used, prior-focused approaches do not succeed in evaluations designed to capture properties of realistic continual learning use cases. As an alternative to prior-focused methods, we introduce a new approximate Bayesian derivation of the continual learning loss. Our loss does not rely on the posterior from earlier tasks, and instead adapts the model itself by changing the likelihood term. We call these approaches likelihood-focused. We then combine prior- and likelihood-focused methods into one objective, tying the two views together under a single unifying framework of approximate Bayesian continual learning.

研究动机与目标

在严格的数据保留/实时约束下激励持续学习。
在贝叶斯框架中将持续学习表述清楚，并识别先验聚焦方法的局限。
提出融合先验和似然组成部分的统一损失。
引入 Variational Generative Replay (VGR) 作为似然聚焦方法。
在标准基准上经验性比较先验、似然和混合方法。

提出的方法

在变分贝叶斯设定中定义先验聚焦、似然聚焦和混合持续学习损失。
推导基于 ELBO 的多任务损失，并展示如何通过生成模型估计旧数据的对数似然。
通过为每个任务训练一个 GAN 来建模 p_t(x,y) 并使用存储的生成器形成伪再演练数据，引入 Variational Generative Replay (VGR)。
形成一个混合损失 L^t_Hybrid，将生成型似然项与每个任务的新前验结合。
在标准基准上将 VGR、VCL（有/无 coresets）以及仅核心集的基线进行比较。

实验结果

研究问题

RQ1是否依赖于先验后验 q_{t-1}(ω) 足以支持持续学习，还是以似然为主的（回忆）方法更优？
RQ2一个融合先验与似然聚焦思想的统一贝叶斯目标是否能提高持续学习的性能和不确定性校准？
RQ3生成回放（VGR）在近似旧任务似然性及提高性能方面的作用？
RQ4核心集对混合方法的性能贡献程度，以及仅由似然聚焦组件是否就能解释结果？

主要发现

似然聚焦方法（如 VGR）在非多任务设置（如单头 Split MNIST）上可超越先验聚焦方法。
核心集在混合方法中驱动显著性能提升，其作用将似然组件的贡献分离出来。
VGR 提供对数据是否看过的更好校准的不确定性，相较于先验聚合后的后验。
混合方法的性能在很大程度上取决于生成回放组件，而不是基于先验的正则化。
在简单基准（Permuted MNIST、multi-headed Split MNIST）上，所有方法表现良好，但在更具挑战性的设置（single-headed Split MNIST）中差异显现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。