QUICK REVIEW

[Paper Review] Knowledge Transfer with Jacobian Matching

Suraj Srinivas, François Fleuret|arXiv (Cornell University)|Mar 1, 2018

Reinforcement Learning in Robotics20 references58 citations

TL;DR

The paper shows that Jacobian matching between teacher and student networks is equivalent to distillation with input noise, derives practical loss forms, and demonstrates improvements in distillation, noise robustness, and transfer learning.

ABSTRACT

Classical distillation methods transfer representations from a "teacher" neural network to a "student" network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matching and distillation with input noise, from which we derive appropriate loss functions for Jacobian matching. We then rely on this analysis to apply Jacobian matching to transfer learning by establishing equivalence of a recent transfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-based penalties improve distillation, robustness to noisy inputs, and transfer learning.

Motivation & Objective

Motivate knowledge transfer between networks of different architectures using Jacobian information.
Establish the theoretical equivalence between Jacobian matching and input-noise-based distillation.
Derive practical loss functions for Jacobian matching applicable to distillation and transfer learning.
Demonstrate empirically that Jacobian-based penalties improve distillation, robustness to noise, and transfer learning.

Proposed method

Derive that matching Jacobians is equivalent to distillation with input noise via first-order Taylor expansion.
Propose squared-error distillation loss and derive Jacobian-regularization terms under different loss functions (squared error and cross-entropy).
Introduce practical approximations to full Jacobians (focus on outputs related to the correct class or largest magnitude) to reduce computation.
Integrate Jacobian matching into transfer learning frameworks, including connections to Learning without Forgetting (LwF) and attention-map matching.
Utilize attention-map based approximations and selective Jacobian computations to enable cross-architecture transfer learning.
Provide empirical validation on CIFAR-100 distillation, noise robustness, and MIT Scenes transfer learning.

Experimental results

Research questions

RQ1Can Jacobian matching be interpreted as distillation with input noise, and what loss corresponds to this equivalence?
RQ2Does Jacobian-based regularization improve performance for distillation, especially in low-data regimes?
RQ3Can Jacobian matching be effectively applied to transfer learning across arbitrary architectures, and how does it relate to LwF and attention-map strategies?
RQ4Does Jacobian regularization enhance robustness to input noise?
RQ5What practical approximations enable efficient Jacobian matching in deep networks and cross-architecture scenarios?

Key findings

Jacobian matching is equivalent to distillation with input noise, yielding an additional Jacobian-regularization term in the loss.
In limited-data distillation on CIFAR-100, combining activations and Jacobians improves accuracy over activation-only distillation, approaching full-data performance with only a fraction of data.
Jacobian-norm penalties improve robustness to Gaussian noise, outperforming standard L2 regularization and dropout in noise-robustness tests.
In transfer learning, incorporating Jacobian matching (with activations and attention) provides gains over activation-only methods, particularly in low-data regimes.
Matching at shallower feature layers for Jacobian loss yields better transfer performance; Jacobian-based methods additive to activation/attention matching consistently improve results.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.