QUICK REVIEW

[论文解读] Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning

Kaiyi Ji, Junjie Yang|arXiv (Cornell University)|Feb 18, 2020

Domain Adaptation and Few-Shot Learning参考文献 52被引用 32

一句话总结

本文提出一个理论框架，证明多步 MAML 在重抽样（resampling）和有限和（finite-sum）设定下的收敛性并表征复杂度，指出内步长应随 N 规模为 1/N，并给出线性随 N 的计算成本的条件。

ABSTRACT

As a popular meta-learning approach, the model-agnostic meta-learning (MAML) algorithm has been widely used due to its simplicity and effectiveness. However, the convergence of the general multi-step MAML still remains unexplored. In this paper, we develop a new theoretical framework to provide such convergence guarantee for two types of objective functions that are of interest in practice: (a) resampling case (e.g., reinforcement learning), where loss functions take the form in expectation and new data are sampled as the algorithm runs; and (b) finite-sum case (e.g., supervised learning), where loss functions take the finite-sum form with given samples. For both cases, we characterize the convergence rate and the computational complexity to attain an $ε$-accurate solution for multi-step MAML in the general nonconvex setting. In particular, our results suggest that an inner-stage stepsize needs to be chosen inversely proportional to the number $N$ of inner-stage steps in order for $N$-step MAML to have guaranteed convergence. From the technical perspective, we develop novel techniques to deal with the nested structure of the meta gradient for multi-step MAML, which can be of independent interest.

研究动机与目标

在非凸设置下研究多步 MAML 收敛性的动机。
提供一个理论框架以分析重抽样和有限和目标结构。
表征达到 ε-精确解的收敛速度和计算复杂度。
就步长选择及实现线性-in-N 复杂度的条件提供指导。

提出的方法

推导用于多步 MAML 的嵌套 SGD/L 内循环和外循环分析。
给出外部更新的梯度表达式：∇L_i(w) = [∏_{j=0}^{N-1}(I−α∇^2 l_i(w̃_j^i))] ∇l_i(w̃_N^i)。
在重抽样情形下建立新界来解耦 Hessian 和梯度估计误差。
将分析扩展到内外部损失不同的有限和情形（l_{S_i} vs l_{T_i}）。
证明元梯度的 Lipschitz 性质并给出元梯度估计量的估计误差界。
建立在β_k = 1/(C_β L̂_{w_k}) 下得到 ε-精确解的条件，其复杂度用 N、ε 和批量大小表示。

实验结果

研究问题

RQ1多步 MAML 是否能在重抽样和有限和设定下，对非凸目标收敛？
RQ2为了保证收敛，内步长 α 应如何随内循环步数 N 变化？
RQ3在存在嵌套内循环的情况下，梯度/Hessian 估计误差对收敛的影响是什么？
RQ4在这两种设定下达到 ε-精确驻点的计算复杂度是多少？
RQ5有限和情形中内外部损失的差异如何影响收敛性分析？

主要发现

为确保收敛，选择内步长 α 使 αL < 2^(1/(2N)) − 1，即 α = Θ(1/(NL))。
在适当参数取值下，N 步 MAML 的梯度和 Hessian 计算复杂度随 N 线性增长。
在小 Hessian 问题中，可以使用更大的 α 而不影响收敛，与经验观察一致。
重抽样分析通过对内优化路径之间距离的界，将 Hessian 逼近误差与梯度误差解耦。
有限和分析处理内外损失的差异，并给出类似的收敛保证。
推论给出 ε-精确解的保证，元迭代次数为 O(1/ε^2)，并给出明确的梯度/ Hessian 计算复杂度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。