Skip to main content
QUICK REVIEW

[Paper Review] Attentive Recurrent Comparators

Pranav Shyam, Shubham Gupta|arXiv (Cornell University)|Mar 2, 2017
Neural dynamics and brain function18 references64 citations
TL;DR

The paper introduces Attentive Recurrent Comparators (ARCs) that repeatedly observe paired images with learned attention and recurrence to form dynamic representations, achieving state-of-the-art one-shot Omniglot classification and strong similarity learning results.

ABSTRACT

Rapid learning requires flexible representations to quickly adopt to new evidence. We develop a novel class of models called Attentive Recurrent Comparators (ARCs) that form representations of objects by cycling through them and making observations. Using the representations extracted by ARCs, we develop a way of approximating a extit{dynamic representation space} and use it for one-shot learning. In the task of one-shot classification on the Omniglot dataset, we achieve the state of the art performance with an error rate of 1.5\%. This represents the first super-human result achieved for this task with a generic model that uses only pixel information.

Motivation & Objective

  • Motivate rapid learning with dynamic representations that evolve with new evidence.
  • Propose a differentiable ARC model that compares objects by alternating attention between them.
  • Demonstrate that ARCs (with or without convolutions) can match or surpass convnets on similarity tasks.
  • Show that ARCs enable a high-performing lazy, relative representation space for one-shot classification.

Proposed method

  • Introduce an ARC consisting of an RNN controller and a differentiable attention mechanism that alternates between two images across time steps.
  • Compute attention glimpse parameters from the previous RNN state; attend to a region of the current image to form G_t; update the RNN state h_t.
  • Optionally incorporate CNN features by applying attention over convolutional feature maps (ConvARC).
  • For one-shot learning, build a relative representation space conditioned on a test sample; use a hierarchical two-level comparison with Bi-LSTM merging and softmax scoring similar to Matching Networks.
  • Train end-to-end to optimize similarity or classification objectives on tasks like Omniglot and CASIA WebFace.

Experimental results

Research questions

  • RQ1Can ARCs form effective dynamic, context-conditioned representations for visual similarity tasks?
  • RQ2Do ARCs with and without convolutional features achieve competitive or superior performance to Siamese networks on verification tasks?
  • RQ3Can a lazy, relative representation space conditioned on a test sample support state-of-the-art one-shot classification?
  • RQ4How does iterative attention between two inputs compare to parallel attention or Siamese-style fusion in terms of performance and efficiency?

Key findings

  • ARC-based similarity learning matches or exceeds strong baselines on verification tasks and achieves state-of-the-art one-shot Omniglot performance.
  • A simple ARC without convolutions can match AlexNet-level performance on Omniglot verification and, with convolutions (ConvARC), surpass Wide ResNet Siamese baselines.
  • ConvARC achieves 96.10% on Omniglot verification across alphabets and 97.5% in within-alphabet one-shot - surpassing several prior methods.
  • On CASIA WebFace verification, ConvARC (81.73%) outperforms a CNN baseline (79.48%).
  • One-shot Omniglot results: Naive ARC 90.30%, Naive ConvARC 96.21%, Full Context ConvARC 97.5% Across Alphabets; Within Alphabets: Naive ARC 91.75%, Naive ConvARC 97.75%, Full Context ConvARC 98.5%.
  • On miniImageNet 5-way 1-shot, Naive ConvARC scores 49.14% while the Full Context ConvARC reaches - note: exact number from the source should be interpreted in context; table reports 49.14% for Naive ConvARC and higher for Full Context ConvARC.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.