[论文解读] Meta-learning with differentiable closed-form solvers
本文提出将可微分的岭回归和逻辑回归求解器(R2-D2 和 LR-D2)作为快速、按情节特定的基学习器,通过闭式解或基于 IRLS 的求解器进行反向传播,实现高效的少样本适应。它在 Omniglot、mini-ImageNet 和 cifar-fs 上在高维嵌入和 Woodbury 恒等式加速下实现具有竞争力甚至优于的方法。
Adapting deep networks to new concepts from a few examples is challenging, due to the high computational requirements of standard fine-tuning procedures. Most work on few-shot learning has thus focused on simple learning techniques for adaptation, such as nearest neighbours or gradient descent. Nonetheless, the machine learning literature contains a wealth of methods that learn non-deep models very efficiently. In this paper, we propose to use these fast convergent methods as the main adaptation mechanism for few-shot learning. The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data. This requires back-propagating errors through the solver steps. While normally the cost of the matrix operations involved in such a process would be significant, by using the Woodbury identity we can make the small number of examples work to our advantage. We propose both closed-form and iterative solvers, based on ridge regression and logistic regression components. Our methods constitute a simple and novel approach to the problem of few-shot learning and achieve performance competitive with or superior to the state of the art on three benchmarks.
研究动机与目标
- Motivate fast adaptation in few-shot learning by using simple, differentiable base learners with closed-form or rapidly converging solutions.
- Integrate ridge regression and logistic regression solvers into a meta-learning framework to allow backpropagation through the learning step.
- Show computational efficiency gains in high-dimensional embedding settings via Woodbury identity.
- Evaluate the proposed methods on standard few-shot benchmarks and compare to state-of-the-art methods.
提出的方法
- Propose a meta-learning setup where the base learner is a differentiable ridge regression layer computing episode-specific weights W from episode data.
- Use the Woodbury identity to compute ridge solutions efficiently when embedding dimensionality is large but episode sample size is small.
- Extend to an iterative logistic regression base learner using Iteratively Reweighted Least Squares (IRLS) to obtain a few-step Newton-like updates.
- Calibrate ridge outputs with a learned scale and bias to align with cross-entropy loss.
- Train the representation and hyperparameters end-to-end by backpropagating through the episode learning steps across many episodes.
- Provide a training protocol where meta-parameters (feature extractor weights, ridge lambda, calibration parameters) are learned via SGD/Adam to minimize held-out episode loss.
实验结果
研究问题
- RQ1Can fast-converging, differentiable solvers (ridge and logistic regression) serve as effective base learners for meta-learning in few-shot tasks?
- RQ2Does backpropagating through closed-form or IRLS-based solvers enable competitive meta-learning performance with high-dimensional embeddings?
- RQ3How does the Woodbury identity influence computational efficiency in few-shot, high-dimensional settings?
- RQ4How do ridge-regularized and logistic-regression-based bases compare to state-of-the-art meta-learning methods on standard benchmarks?
- RQ5What calibration steps are necessary to use regression outputs with cross-entropy loss for classification?
主要发现
- R2-D2 (ridge regression) achieves competitive to state-of-the-art results on mini-ImageNet and cifar-fs with shallow architectures.
- LR-D2 (iterative logistic regression) reaches comparable performance with different iteration counts, showing flexibility of IRLS in this meta-learning framework.
- On Omniglot, the method is competitive and performs well across problems, including higher-shot settings.
- The Woodbury-based formulation significantly reduces computational cost when embedding dimensionality is high and the episode size is small.
- Calibration of the regression outputs (scaling and bias) is effective for aligning with cross-entropy loss in few-shot classification.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。