[Paper Review] Attentive Recurrent Comparators
The paper introduces Attentive Recurrent Comparators (ARCs) that repeatedly observe paired images with learned attention and recurrence to form dynamic representations, achieving state-of-the-art one-shot Omniglot classification and strong similarity learning results.
Rapid learning requires flexible representations to quickly adopt to new evidence. We develop a novel class of models called Attentive Recurrent Comparators (ARCs) that form representations of objects by cycling through them and making observations. Using the representations extracted by ARCs, we develop a way of approximating a extit{dynamic representation space} and use it for one-shot learning. In the task of one-shot classification on the Omniglot dataset, we achieve the state of the art performance with an error rate of 1.5\%. This represents the first super-human result achieved for this task with a generic model that uses only pixel information.
Motivation & Objective
- Motivate rapid learning with dynamic representations that evolve with new evidence.
- Propose a differentiable ARC model that compares objects by alternating attention between them.
- Demonstrate that ARCs (with or without convolutions) can match or surpass convnets on similarity tasks.
- Show that ARCs enable a high-performing lazy, relative representation space for one-shot classification.
Proposed method
- Introduce an ARC consisting of an RNN controller and a differentiable attention mechanism that alternates between two images across time steps.
- Compute attention glimpse parameters from the previous RNN state; attend to a region of the current image to form G_t; update the RNN state h_t.
- Optionally incorporate CNN features by applying attention over convolutional feature maps (ConvARC).
- For one-shot learning, build a relative representation space conditioned on a test sample; use a hierarchical two-level comparison with Bi-LSTM merging and softmax scoring similar to Matching Networks.
- Train end-to-end to optimize similarity or classification objectives on tasks like Omniglot and CASIA WebFace.
Experimental results
Research questions
- RQ1Can ARCs form effective dynamic, context-conditioned representations for visual similarity tasks?
- RQ2Do ARCs with and without convolutional features achieve competitive or superior performance to Siamese networks on verification tasks?
- RQ3Can a lazy, relative representation space conditioned on a test sample support state-of-the-art one-shot classification?
- RQ4How does iterative attention between two inputs compare to parallel attention or Siamese-style fusion in terms of performance and efficiency?
Key findings
- ARC-based similarity learning matches or exceeds strong baselines on verification tasks and achieves state-of-the-art one-shot Omniglot performance.
- A simple ARC without convolutions can match AlexNet-level performance on Omniglot verification and, with convolutions (ConvARC), surpass Wide ResNet Siamese baselines.
- ConvARC achieves 96.10% on Omniglot verification across alphabets and 97.5% in within-alphabet one-shot - surpassing several prior methods.
- On CASIA WebFace verification, ConvARC (81.73%) outperforms a CNN baseline (79.48%).
- One-shot Omniglot results: Naive ARC 90.30%, Naive ConvARC 96.21%, Full Context ConvARC 97.5% Across Alphabets; Within Alphabets: Naive ARC 91.75%, Naive ConvARC 97.75%, Full Context ConvARC 98.5%.
- On miniImageNet 5-way 1-shot, Naive ConvARC scores 49.14% while the Full Context ConvARC reaches - note: exact number from the source should be interpreted in context; table reports 49.14% for Naive ConvARC and higher for Full Context ConvARC.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.