Skip to main content
QUICK REVIEW

[论文解读] Meta-learning with differentiable closed-form solvers

Luca Bertinetto, João F. Henriques|arXiv (Cornell University)|May 21, 2018
Domain Adaptation and Few-Shot Learning参考文献 54被引用 151
一句话总结

本文提出将可微分的岭回归和逻辑回归求解器(R2-D2 和 LR-D2)作为快速、按情节特定的基学习器,通过闭式解或基于 IRLS 的求解器进行反向传播,实现高效的少样本适应。它在 Omniglot、mini-ImageNet 和 cifar-fs 上在高维嵌入和 Woodbury 恒等式加速下实现具有竞争力甚至优于的方法。

ABSTRACT

Adapting deep networks to new concepts from a few examples is challenging, due to the high computational requirements of standard fine-tuning procedures. Most work on few-shot learning has thus focused on simple learning techniques for adaptation, such as nearest neighbours or gradient descent. Nonetheless, the machine learning literature contains a wealth of methods that learn non-deep models very efficiently. In this paper, we propose to use these fast convergent methods as the main adaptation mechanism for few-shot learning. The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data. This requires back-propagating errors through the solver steps. While normally the cost of the matrix operations involved in such a process would be significant, by using the Woodbury identity we can make the small number of examples work to our advantage. We propose both closed-form and iterative solvers, based on ridge regression and logistic regression components. Our methods constitute a simple and novel approach to the problem of few-shot learning and achieve performance competitive with or superior to the state of the art on three benchmarks.

研究动机与目标

  • Motivate fast adaptation in few-shot learning by using simple, differentiable base learners with closed-form or rapidly converging solutions.
  • Integrate ridge regression and logistic regression solvers into a meta-learning framework to allow backpropagation through the learning step.
  • Show computational efficiency gains in high-dimensional embedding settings via Woodbury identity.
  • Evaluate the proposed methods on standard few-shot benchmarks and compare to state-of-the-art methods.

提出的方法

  • Propose a meta-learning setup where the base learner is a differentiable ridge regression layer computing episode-specific weights W from episode data.
  • Use the Woodbury identity to compute ridge solutions efficiently when embedding dimensionality is large but episode sample size is small.
  • Extend to an iterative logistic regression base learner using Iteratively Reweighted Least Squares (IRLS) to obtain a few-step Newton-like updates.
  • Calibrate ridge outputs with a learned scale and bias to align with cross-entropy loss.
  • Train the representation and hyperparameters end-to-end by backpropagating through the episode learning steps across many episodes.
  • Provide a training protocol where meta-parameters (feature extractor weights, ridge lambda, calibration parameters) are learned via SGD/Adam to minimize held-out episode loss.

实验结果

研究问题

  • RQ1Can fast-converging, differentiable solvers (ridge and logistic regression) serve as effective base learners for meta-learning in few-shot tasks?
  • RQ2Does backpropagating through closed-form or IRLS-based solvers enable competitive meta-learning performance with high-dimensional embeddings?
  • RQ3How does the Woodbury identity influence computational efficiency in few-shot, high-dimensional settings?
  • RQ4How do ridge-regularized and logistic-regression-based bases compare to state-of-the-art meta-learning methods on standard benchmarks?
  • RQ5What calibration steps are necessary to use regression outputs with cross-entropy loss for classification?

主要发现

  • R2-D2 (ridge regression) achieves competitive to state-of-the-art results on mini-ImageNet and cifar-fs with shallow architectures.
  • LR-D2 (iterative logistic regression) reaches comparable performance with different iteration counts, showing flexibility of IRLS in this meta-learning framework.
  • On Omniglot, the method is competitive and performs well across problems, including higher-shot settings.
  • The Woodbury-based formulation significantly reduces computational cost when embedding dimensionality is high and the episode size is small.
  • Calibration of the regression outputs (scaling and bias) is effective for aligning with cross-entropy loss in few-shot classification.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。