QUICK REVIEW

[论文解读] Meta-Learning with Implicit Gradients

Aravind Rajeswaran, Chelsea Finn|arXiv (Cornell University)|Sep 10, 2019

Domain Adaptation and Few-Shot Learning被引用 217

一句话总结

本文提出隐式 MAML（iMAML），一种内存高效的元学习方法，在不对内部循环优化路径进行微分的情况下，利用隐式微分和 Hessian-vector 乘积来计算精确的元梯度。它在小样本识别基准上取得具有竞争力或更优的性能，同时将元梯度与内部优化器解耦。

ABSTRACT

A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.

研究动机与目标

指出在对内部循环优化进行微分时，基于梯度的元学习在可扩展性方面的挑战。
提出一种基于隐式微分的元梯度计算，该计算仅依赖于内部解，而非优化路径。
开发带有近端正则化的 iMAML 算法，以稳定内部优化并实现内存效率。
给出关于近似元梯度的内存与计算的理论保证，并在小样本学习任务上展示经验收益。

提出的方法

将元学习形式化为双层优化，其中内部问题围绕元参数以近端项进行正则化。
推导内部优化解的隐式雅可比矩阵，从而在不对内部循环进行微分的情况下得到元梯度。
引入一个实用的 iMAML 算法，使用 delta 精确的内部求解器和通过共轭梯度获得 delta'-近似雅可比矩阵来计算 Hessian-vector 乘积。
表明 iMAML 在保持对内部优化的反向传播的极小极大复杂性同时，对内部步数的内存为 O(1)。
提供理论保证：可以在内存独立于内部迭代和基于 CG 的 Hessian-vector 乘积下获得 epsilon-近似元梯度。
在 Omniglot 和 Mini-ImageNet 上展示经验结果，显示与 MAML 和 FOMAML 相比具有竞争力的性能以及有利的计算/内存权衡。

实验结果

研究问题

RQ1隐式微分是否能够在不对内部优化路径进行微分的情况下得到准确的元梯度？
RQ2随着内部循环步数增加，iMAML 的内存与计算成本与标准 MAML 相比如何？
RQ3基于 iMAML 的元梯度是否能够在不梯度消失的情况下扩展到更复杂的内部优化器和更大的数据集？
RQ4在小样本基准上的经验结果是否支持理论上的内存/计算优势和性能提升？

主要发现

iMAML 能在内存不随内部循环步数增长而增长，并且总体计算量与基于反向传播的方法相当的情况下，计算出准确的元梯度。
在合成测试中，iMAML 渐近地匹配精确元梯度，并提供比 MAML 更好的有限步近似。
在 Omniglot 上，使用梯度下降作为内部循环的 iMAML 与全 MAML 相媲美，并且优于一阶变体，且无 Hessian 的内部优化带来进一步提升。
在 Mini-ImageNet 上，在报道设定中，iMAML 的准确率高于 MAML 和 FOMAML。
理论结果表明，在内存高效的方式下可以通过 CG 基于 Hessian-vector 乘积获得 epsilon-近似元梯度，且在温和假设下，iMAML 能找到外部目标的驻点。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。