QUICK REVIEW

[论文解读] Meta-learning with differentiable closed-form solvers

Luca Bertinetto, João F. Henriques|arXiv (Cornell University)|May 21, 2018

Domain Adaptation and Few-Shot Learning参考文献 54被引用 151

一句话总结

本文提出将可微分的岭回归和逻辑回归求解器（R2-D2 和 LR-D2）作为快速、按情节特定的基学习器，通过闭式解或基于 IRLS 的求解器进行反向传播，实现高效的少样本适应。它在 Omniglot、mini-ImageNet 和 cifar-fs 上在高维嵌入和 Woodbury 恒等式加速下实现具有竞争力甚至优于的方法。

ABSTRACT

Adapting deep networks to new concepts from a few examples is challenging, due to the high computational requirements of standard fine-tuning procedures. Most work on few-shot learning has thus focused on simple learning techniques for adaptation, such as nearest neighbours or gradient descent. Nonetheless, the machine learning literature contains a wealth of methods that learn non-deep models very efficiently. In this paper, we propose to use these fast convergent methods as the main adaptation mechanism for few-shot learning. The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data. This requires back-propagating errors through the solver steps. While normally the cost of the matrix operations involved in such a process would be significant, by using the Woodbury identity we can make the small number of examples work to our advantage. We propose both closed-form and iterative solvers, based on ridge regression and logistic regression components. Our methods constitute a simple and novel approach to the problem of few-shot learning and achieve performance competitive with or superior to the state of the art on three benchmarks.

研究动机与目标

Motivate fast adaptation in few-shot learning by using simple, differentiable base learners with closed-form or rapidly converging solutions.
Integrate ridge regression and logistic regression solvers into a meta-learning framework to allow backpropagation through the learning step.
Show computational efficiency gains in high-dimensional embedding settings via Woodbury identity.
Evaluate the proposed methods on standard few-shot benchmarks and compare to state-of-the-art methods.

提出的方法

Propose a meta-learning setup where the base learner is a differentiable ridge regression layer computing episode-specific weights W from episode data.
Use the Woodbury identity to compute ridge solutions efficiently when embedding dimensionality is large but episode sample size is small.
Extend to an iterative logistic regression base learner using Iteratively Reweighted Least Squares (IRLS) to obtain a few-step Newton-like updates.
Calibrate ridge outputs with a learned scale and bias to align with cross-entropy loss.
Train the representation and hyperparameters end-to-end by backpropagating through the episode learning steps across many episodes.
Provide a training protocol where meta-parameters (feature extractor weights, ridge lambda, calibration parameters) are learned via SGD/Adam to minimize held-out episode loss.

实验结果

研究问题

RQ1Can fast-converging, differentiable solvers (ridge and logistic regression) serve as effective base learners for meta-learning in few-shot tasks?
RQ2Does backpropagating through closed-form or IRLS-based solvers enable competitive meta-learning performance with high-dimensional embeddings?
RQ3How does the Woodbury identity influence computational efficiency in few-shot, high-dimensional settings?
RQ4How do ridge-regularized and logistic-regression-based bases compare to state-of-the-art meta-learning methods on standard benchmarks?
RQ5What calibration steps are necessary to use regression outputs with cross-entropy loss for classification?

主要发现

R2-D2 (ridge regression) achieves competitive to state-of-the-art results on mini-ImageNet and cifar-fs with shallow architectures.
LR-D2 (iterative logistic regression) reaches comparable performance with different iteration counts, showing flexibility of IRLS in this meta-learning framework.
On Omniglot, the method is competitive and performs well across problems, including higher-shot settings.
The Woodbury-based formulation significantly reduces computational cost when embedding dimensionality is high and the episode size is small.
Calibration of the regression outputs (scaling and bias) is effective for aligning with cross-entropy loss in few-shot classification.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。