QUICK REVIEW

[论文解读] Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

Shell Xu Hu, Pablo García Moreno|arXiv (Cornell University)|Apr 27, 2020

Domain Adaptation and Few-Shot Learning参考文献 48被引用 81

一句话总结

本论文提出了一种使用经验贝叶斯和合成梯度的传导式元学习方法，以利用未标记的查询数据，在小样本基准上达到最先进的结果。

ABSTRACT

We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior on the query set of each task. We derive a novel amortized variational inference that couples all the variational posteriors via a meta-model, which consists of a synthetic gradient network and an initialization network. Each variational posterior is derived from synthetic gradient descent to approximate the true posterior on the query set, although where we do not have access to the true gradient. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification outperform previous state-of-the-art methods. Besides, we conduct two zero-shot learning experiments to further explore the potential of the synthetic gradient.

研究动机与目标

在传导设定中利用未标记的查询数据来改进任务特定模型的元学习的动机。
为多任务元学习开发一个将查询集纳入其中的经验贝叶斯公式。
提出通过一个合成梯度网络的摊销变分推断来实现 EB 模型。
证明传导变分后验可以改进泛化。
在标准小样本基准上进行经验验证，并探索零样本潜力。

提出的方法

将元学习表述为带有任务特定权重和共享元参数的经验贝叶斯模型。
引入变分后验 q_theta(w_t) 和摊销推断网络 q_phi(d_t^l, x_t)，该网络同时考虑有标签的支持集和未标记的查询数据。
通过梯度网络 xi 和初始化网络 lambda，展开使用合成梯度来近似真实梯度的完整推理动力学，在没有标签的情况下。
将内层优化参数化为 theta_t^{k+1} = theta_t^k - eta [ E_epsilon[ (1/n) sum_i xi(y_hat_{t,i}) ∂y_hat_{t,i}/∂w_t ∂w_t/∂theta_t ] + ∇_{theta_t} KL(q_{theta_t^k}(w_t) || p_psi(w_t)) ]，与方程 (10) 匹配。
将训练目标定义为跨任务的 KL-ELBO 总和，与信息瓶颈解释（称为 Synthetic Information Bottleneck（SIB））相关。
提供一个实用算法（Algorithm 1）来训练 f、psi 和 phi（lambda、xi），并带有内部合成梯度步骤。

实验结果

研究问题

RQ1传导推断在未标记查询输入的情况下是否能在元学习泛化方面优于传统的归纳方法？
RQ2基于经验贝叶斯框架、使用合成梯度的元学习是否在标准基准上获得更好的小样本表现？
RQ3传导变分后验如何与多任务设置中的信息瓶颈式泛化相关？
RQ4改变合成梯度步数 K 对性能有何影响？
RQ5该方法是否能够扩展到零样本学习场景，即没有支持集标签可用？

主要发现

SIB 与合成梯度在 MiniImageNet 和 CIFAR-FS 上显著提升了 1-shot 的准确率，相较于多种基线。
将 K 从 0 增加到 3 或 5，在多种骨干网络上获得 1-shot 设置的性能提升。
对于 5-shot，结果与最先进的传导方法或 CTM/Gidaris 等变体在某些骨干下并非始终优越，但具备竞争力。
该方法对不同特征骨干（Conv-4-64、Conv-4-128、WRN-28-10）具有鲁棒性。
论文还探索了零样本回归任务，显示合成梯度框架在标准元学习之外的潜力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。