Skip to main content
QUICK REVIEW

[Paper Review] Optimizing Mode Connectivity via Neuron Alignment

N. Joseph Tatro, Pin‐Yu Chen|arXiv (Cornell University)|Jan 1, 2020
Adversarial Robustness in Machine Learning2 citations
TL;DR

This paper proposes neuron alignment to optimize mode connectivity in deep neural network loss landscapes by accounting for weight permutation symmetries. By aligning intermediate activation distributions across models, the method finds lower-loss, planar curves that significantly reduce robust loss barriers between adversarially robust models, improving generalization and robustness.

ABSTRACT

The loss landscapes of deep neural networks are not well understood due to their high nonconvexity. Empirically, the local minima of these loss functions can be connected by a learned curve in model space, along which the loss remains nearly constant; a feature known as mode connectivity. Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by model weight permutations. We propose a more general framework to investigate the effect of symmetry on landscape connectivity by accounting for the weight permutations of the networks being connected. To approximate the optimal permutation, we introduce an inexpensive heuristic referred to as neuron alignment. Neuron alignment promotes similarity between the distribution of intermediate activations of models along the curve. We provide theoretical analysis establishing the benefit of alignment to mode connectivity based on this simple heuristic. We empirically verify that the permutation given by alignment is locally optimal via a proximal alternating minimization scheme. Empirically, optimizing the weight permutation is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes. Our alignment method can significantly alleviate the recently identified robust loss barrier on the path connecting two adversarial robust models and find more robust and accurate models on the path.

Motivation & Objective

  • To address the lack of understanding in high-dimensional, nonconvex loss landscapes of deep neural networks.
  • To investigate how weight permutation symmetries affect mode connectivity in model space.
  • To develop a method that improves curve-finding between models by aligning neuron activations across the path.
  • To reduce the robust loss barrier between adversarially trained models, enabling more stable and accurate interpolation.

Proposed method

  • Introduces neuron alignment as a heuristic to approximate optimal weight permutations between two deep neural networks.
  • Aligns intermediate activation distributions of the two models to promote structural similarity along the interpolation path.
  • Uses a proximal alternating minimization scheme to empirically verify that the aligned permutation is locally optimal.
  • Constructs a planar, low-loss curve in model space by optimizing the permutation of network weights via activation alignment.
  • The method implicitly accounts for weight permutation symmetries in the loss landscape, improving connectivity.

Experimental results

Research questions

  • RQ1How do weight permutation symmetries in deep neural networks affect the connectivity of loss landscapes?
  • RQ2Can aligning intermediate activation distributions between models lead to better interpolation paths with lower loss?
  • RQ3Does neuron alignment reduce the robust loss barrier between two adversarially robust models?
  • RQ4Is the permutation found via neuron alignment locally optimal for minimizing path loss?
  • RQ5Can the aligned path generalize better, yielding more robust and accurate models than standard interpolation?

Key findings

  • Neuron alignment significantly reduces the robust loss barrier between two adversarially robust models, enabling smoother and lower-loss interpolation.
  • The permutation obtained via neuron alignment is empirically verified to be locally optimal through a proximal alternating minimization scheme.
  • The method successfully finds planar, low-loss curves that generalize better than standard interpolation methods.
  • Activation distribution alignment leads to improved robustness and accuracy along the interpolation path.
  • The approach demonstrates that accounting for weight permutation symmetries is critical for effective mode connectivity in deep learning.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.