[Paper Review] Meta-Learning with Implicit Gradients
The paper introduces implicit MAML (iMAML), a memory-efficient meta-learning method that computes exact meta-gradients without differentiating through the inner-loop optimization path, using implicit differentiation and Hessian-vector products. It achieves competitive or superior performance on few-shot recognition benchmarks while decoupling meta-gradients from the inner optimizer.
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.
Motivation & Objective
- Motivate the scalability challenges of gradient-based meta-learning when inner-loop optimization is differentiated.
- Propose an implicit differentiation-based meta-gradient computation that depends only on the inner-solution, not the optimization path.
- Develop the iMAML algorithm with proximal regularization to stabilize inner optimization and enable memory efficiency.
- Provide theoretical guarantees on the memory and computation for approximate meta-gradients and demonstrate empirical benefits on few-shot learning tasks.
Proposed method
- Formulate bi-level optimization for meta-learning with an inner problem regularized by a proximal term around the meta-parameters.
- Derive an implicit Jacobian for the inner-optimization solution that yields the meta-gradient without differentiating through the inner loop.
- Introduce a practical iMAML algorithm using a delta-accurate inner solver and a delta'-approximate Jacobian via conjugate gradients to compute Hessian-vector products.
- Show that iMAML matches the minimax complexity of backpropagation through inner optimization while using O(1) memory with respect to inner steps.
- Provide theoretical guarantees: an epsilon-approximate meta-gradient can be obtained with memory independent of inner iterations and CG-based Hessian-vector products.
- Demonstrate empirical results on Omniglot and Mini-ImageNet showing competitive performance and favorable compute/memory trade-offs compared to MAML and FOMAML.
Experimental results
Research questions
- RQ1Can implicit differentiation yield accurate meta-gradients without differentiating through the inner optimization path?
- RQ2How do memory and computational costs of iMAML compare to standard MAML as the number of inner-loop steps grows?
- RQ3Do iMAML-based meta-gradients enable scaling to more complex inner optimizers and larger datasets without gradient vanishing?
- RQ4Do the empirical results on few-shot benchmarks support the theoretical memory/computation advantages and performance gains?,
Key findings
- iMAML can compute accurate meta-gradients with memory that does not grow with the number of inner-loop steps and with comparable overall compute to backpropagation-based methods.
- In synthetic tests, iMAML asymptotically matches the exact meta-gradient and provides better finite-step approximations than MAML.
- On Omniglot, iMAML with gradient descent inner-loop is competitive with full MAML and outperforms first-order variants, with Hessian-free inner optimization offering further gains.
- On Mini-ImageNet, iMAML achieves higher accuracy than MAML and FOMAML in reported settings.
- Theoretical results show an epsilon-approximate meta-gradient can be obtained with CG-based Hessian-vector products in a memory-efficient way, and iMAML finds stationary points of the outer objective under mild assumptions.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.