QUICK REVIEW

[论文解读] Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

Yibo Yang, Shixiang Chen|arXiv (Cornell University)|Mar 17, 2022

Imbalanced Data Classification Techniques被引用 21

一句话总结

这篇论文表明，将一个单纯形 ETF 分类器固定在网络末端，即使在数据不平衡时也会诱发神经塌陷，并引入具有更好收敛性的点回归损失，提升在长尾和细粒度分类上的表现。

ABSTRACT

Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class. A recent study has shown a phenomenon called neural collapse that the within-class means of features and the classifier vectors converge to the vertices of a simplex equiangular tight frame (ETF) at the terminal phase of training on a balanced dataset. Since the ETF geometric structure maximally separates the pair-wise angles of all classes in the classifier, it is natural to raise the question, why do we spend an effort to learn a classifier when we know its optimal geometric structure? In this paper, we study the potential of learning a neural network for classification with the classifier randomly initialized as an ETF and fixed during training. Our analytical work based on the layer-peeled model indicates that the feature learning with a fixed ETF classifier naturally leads to the neural collapse state even when the dataset is imbalanced among classes. We further show that in this case the cross entropy (CE) loss is not necessary and can be replaced by a simple squared loss that shares the same global optimality but enjoys a better convergence property. Our experimental results show that our method is able to bring significant improvements with faster convergence on multiple imbalanced datasets.

研究动机与目标

动机：在不平衡学习中，是否需要一个可学习的端部分类器以实现神经塌陷？
研究在不平衡条件下固定一个单纯形 ETF 分类器及其对特征–分类器对齐的影响。
开发一种针对 ETF 分类器、具有理论收敛保证的损失函数（dot-regression）。
在多个数据集上展示在长尾和细粒度分类上的实证收益。

提出的方法

将最后一层分类器初始化为一个随机单纯形 ETF，并在训练过程中保持固定（DLPM）。
在固定 ETF 分类器下分析逐层剥离的模型，以展示神经塌陷（NC），无论类别分布是否平衡。
比较在固定 ETF 下带有交叉熵损失的梯度动力学，突出其消除了对特征的有争议的 push 项。
引入 dot-regression（DR）损失，能够再现向正确类别方向的拉回梯度，同时避免 push 项。
给出理论结果（定理1、定理2），关于在 ETF 设置下 CE 与 DR 的全局最优性与收敛性质。
展示在长尾数据集上的实证改进，并拓展到细粒度分类。

实验结果

研究问题

RQ1即使类别分布不平衡，当分类器固定为单纯形 ETF 时，是否也能诱发神经塌陷？
RQ2为什么带有可学习分类器的交叉熵损失会导致少数类别崩溃，固定的 ETF 分类器能否避免？
RQ3一种简单的 dot-regression 损失是否在更好的收敛性下实现相同的全局最优性？
RQ4在多个数据集上，ETF 固定方法是否提升了在长尾和细粒度分类上的实际性能？

主要发现

全局最优性：带有固定 ETF 分类器的解耦层剥离模型的任何全局极小值都会在特征与分类器方向之间产生一个单纯形 ETF 对齐，无论类别平衡情况如何（定理1）。
梯度分析表明，CE 的 push 项在不平衡数据上可能使学习不稳定，而固定的 ETF 通过对正确类别方向的一致拉回来避免该问题（第4.2节讨论）。
在固定 ETF 设置下，DR 损失与 CE 具有相同的神经塌陷全局最优性，但收敛性更好（定理2）。
实证结果显示，采用 DR 损失的 ETF 分类器在多个数据集和特征骨干上提升了长尾准确度，通常比使用 CE 的可学习分类器收敛更快（表1–表3讨论）。
该方法在更少训练轮次下对 ImageNet-LT 也带来收益，并提升细粒度分类（表4及第5节）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。