QUICK REVIEW

[论文解读] Learning feed-forward one-shot learners

Luca Bertinetto, João F. Henriques|arXiv (Cornell University)|Jun 16, 2016

Video Surveillance and Tracking Methods参考文献 17被引用 241

一句话总结

该论文提出一个 learnet，即一个第二个神经网络，可以通过单个示例预测 pupil 网络的参数，从而实现真正的前馈式一次学习用于分类和跟踪。它通过分解的线性与卷积层来保持预测参数空间的可控性，并在 Omniglot OCR 与视觉对象跟踪基准测试中展示出具有竞争力的结果。

ABSTRACT

One-shot learning is usually tackled by using generative models or discriminative embeddings. Discriminative methods based on deep learning, which are very effective in other learning scenarios, are ill-suited for one-shot learning as they need large amounts of training data. In this paper, we propose a method to learn the parameters of a deep model in one shot. We construct the learner as a second deep network, called a learnet, which predicts the parameters of a pupil network from a single exemplar. In this manner we obtain an efficient feed-forward one-shot learner, trained end-to-end by minimizing a one-shot classification objective in a learning to learn formulation. In order to make the construction feasible, we propose a number of factorizations of the parameters of the pupil network. We demonstrate encouraging results by learning characters from single exemplars in Omniglot, and by tracking visual objects from a single initial exemplar in the Visual Object Tracking benchmark.

研究动机与目标

Motivate and address the bottleneck of one-shot discriminative learning without iterative optimization.
Propose a meta-learning network (learnet) that predicts all parameters of a pupil network from a single exemplar.
Develop parameter factorization (diagonal/unshared) to make one-shot parameter prediction feasible.
Demonstrate feasibility and competitiveness on OCR (Omniglot) and visual object tracking benchmarks.

提出的方法

Formulate one-shot learning as dynamic parameter prediction via a learnet that maps an exemplar z to the parameters W of a pupil network φ(·;W).
Train the learnet end-to-end by minimizing a one-shot objective across triplets (x, z, ℓ) with ℓ indicating same/different class.
Address parameter explosion by factorizing weight matrices into M′ diag(w(z)) M, reducing the learnet output from dk to d (linear) or f^2 d (convolutional).
Extend factorization to convolutional layers as y = M′ * w(z) *d M * x + b(z), enabling channel-wise disentanglement.
Compare three architectures: siamese baseline, siamese learnet, and single-stream learnet, including a variant with factorized convolutions.]
research_questions: [

实验结果

研究问题

RQ1Can a deep network predict all parameters of another network from a single exemplar, enabling true one-shot discriminative learning?
RQ2Does a feed-forward learnet offer practical speed advantages over iterative one-shot methods like exemplar-SVMs?
RQ3How do factorized linear and convolutional layers affect the feasibility and performance of dynamic parameter prediction in one-shot learning?
RQ4Are learnet-based one-shot models competitive with siamese embeddings on OCR and tracking tasks?

主要发现

On Omniglot OCR, the single-stream learnet achieved 28.6% error with weighted L1 distance, outperforming standard siamese baselines.
Learnets using dynamic, predicted convolutional filters can improve tracking performance on the VOT2015 benchmark, often ranking favorably against recent trackers while running in real time (>60 FPS).
Factorized convolutional layers reduce the parameter prediction burden without severely hurting accuracy for the OCR task in this setup.
Predicting a full set of layer parameters from a single exemplar is feasible when using the proposed factorization, avoiding the quadratic scaling problem of naïve parameter prediction.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。