QUICK REVIEW

[论文解读] Bilevel Programming for Hyperparameter Optimization and Meta-Learning

Luca Franceschi, Paolo Frasconi|arXiv (Cornell University)|Jun 13, 2018

Domain Adaptation and Few-Shot Learning参考文献 37被引用 110

一句话总结

本文提出了一个统一的双层优化框架，将基于梯度的超参数优化与元学习联系起来，并在少样本任务的学习-如何学习中证明其有效性。它在深度网络中实现了超表示法，并为近似内层-外层问题提供理论收敛性保证。

ABSTRACT

We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We provide sufficient conditions under which solutions of the approximate problem converge to those of the exact problem. We instantiate our approach for meta-learning in the case of deep learning where representation layers are treated as hyperparameters shared across a set of training episodes. In experiments, we confirm our theoretical findings, present encouraging results for few-shot learning and contrast the bilevel approach against classical approaches for learning-to-learn.

研究动机与目标

为 HO 与 ML 提出统一的数学框架作为双层优化的动机。
证明在合理条件下，近似的内层-外层问题可以收敛到精确的双层形式。
在深度网络中实现该方法，用跨剧集共享表示来进行元学习。
在少样本学习基准（Omniglot 和 MiniImagenet）上展示经验收益。

提出的方法

将 HO 和 ML 表述为一个双层问题，内层目标为 L_lambda，外层目标为 E。
通过在 T 步内近似地模拟内层优化动力学来求解双层问题，得到 w_{T,λ}。
通过扩展的反向超梯度算法计算超梯度以更新超参数 λ。
通过学习跨任务的共享表示 h_λ，并训练任务特定的分类器 g^j 来实例化 ML。
在温和假设下提供理论结果，确保近似问题的存在性与收敛性到精确双层问题。
在深度网络中对表示学习进行实验，并分析内迭代次数 T 对性能的影响。

实验结果

研究问题

RQ1一个双层公式能否将超参数优化与元学习在单一数学框架中统一起来？
RQ2在何种条件下，有限 T 的近似内-外层解收敛到精确的双层解？
RQ3学习跨任务的共享超表示是否能提升少样本学习的性能？
RQ4内层优化步数 T 的数量如何影响少样本场景下的泛化与训练时间？

主要发现

Method	Omniglot 5 classes 1-shot	Omniglot 5 classes 5-shot	Omniglot 20 classes 1-shot	Omniglot 20 classes 5-shot	MiniImagenet 5 classes 1-shot	MiniImagenet 5 classes 5-shot
Siamese nets (Koch et al., 2015)	97.3	98.4	88.2	97.0	-	-
Matching nets (Vinyals et al., 2016)	98.1	98.9	93.8	98.5	43.44±0.77	55.31±0.73
Neural stat. (Edwards and Storkey, 2016)	98.1	99.5	93.2	98.1	-	-
Memory mod. (Kaiser et al., 2017)	98.4	99.6	95.0	98.6	-	-
Meta-LSTM (Ravi and Larochelle, 2017)	-	-	-	-	43.56±0.84	60.60±0.71
MAML (Finn et al., 2017)	98.7	99.9	95.8	98.9	48.70±1.75	63.11±0.92
Meta-networks (Munkhdalai and Yu, 2017)	98.9	-	97.0	-	49.21±0.96	-
Prototypical Net. (Snell et al., 2017)	98.8	99.7	96.0	98.9	49.42±0.78	68.20±0.66
SNAIL (Mishra et al., 2018)	99.1	99.8	97.6	99.4	55.71±0.99	68.88±0.92
Hyper-representation	98.6	99.5	95.5	98.4	50.54±0.85	64.53±0.68

在适当的连续性和紧致性假设下，当内层迭代 T→∞ 时，近似双层方法收敛到精确问题。
提前停止（较小的 T）可以作为正则化，在某些设置中比大 T 解决方案具有更好的泛化。
带共享表示层的超表示在 Omniglot 与 MiniImagenet 的少样本精度优于若干基线。
在超表示设置中，使用残差网络作为表示映射显著提升相较于简单卷积网络的表现。
所提出的超表示方法在少样本学习的最新方法中具有竞争力，凸显学习到的共享表示的价值。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。