Skip to main content
QUICK REVIEW

[论文解读] Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

Kaihua Tang, Jianqiang Huang|arXiv (Cornell University)|Sep 28, 2020
Domain Adaptation and Few-Shot Learning参考文献 59被引用 50
一句话总结

本论文建立一个因果框架,将 SGD 动量识别为长尾分类中的混杂因素,并提出一阶段去混训练与总直接效应(TDE)推断,以在保留有益中介作用的同时去除有害的后门偏差。

ABSTRACT

As the class size grows, maintaining a balanced dataset across many classes is challenging because the data are long-tailed in nature; it is even impossible when the sample-of-interest co-exists with each other in one collectable unit, e.g., multiple visual instances in one image. Therefore, long-tailed classification is the key to deep learning at scale. However, existing methods are mainly based on re-weighting/re-sampling heuristics that lack a fundamental theory. In this paper, we establish a causal inference framework, which not only unravels the whys of previous methods, but also derives a new principled solution. Specifically, our theory shows that the SGD momentum is essentially a confounder in long-tailed classification. On one hand, it has a harmful causal effect that misleads the tail prediction biased towards the head. On the other hand, its induced mediation also benefits the representation learning and head prediction. Our framework elegantly disentangles the paradoxical effects of the momentum, by pursuing the direct causal effect caused by an input sample. In particular, we use causal intervention in training, and counterfactual reasoning in inference, to remove the "bad" while keep the "good". We achieve new state-of-the-arts on three long-tailed visual recognition benchmarks: Long-tailed CIFAR-10/-100, ImageNet-LT for image classification and LVIS for instance segmentation.

研究动机与目标

  • 解释为什么在动量驱动的混淆下,长尾分类方法难以实现全面泛化。
  • 提出一个有原理的因果学习框架,在保留有益中介的同时消除有害的混淆效应。
  • 提供一个基于去混淆训练和 TDE 推断的一步式、无需重新训练的长尾识别解决方案。

提出的方法

  • 在四变量因果图(M, X, D, Y)中将 momentum 建模为混杂因素 M。
  • 应用后门调整以推导去混训练目标,以估计 P(Y|do(X)).
  • 使用能量基、多头加权方案来近似逆概率加权,用于去混的 logits。
  • 将 Total Direct Effect (TDE) Y|do(X) 计算为一个对照事实差异,保留 X 的直接效应,同时减去通过 D 的间接效应。
  • 在推断阶段,对比 X0(空输入)的反事实应用 TDE 来隔离直接效应;为带背景类的任务引入背景豁免推断。

实验结果

研究问题

  • RQ1SGD 动量是否能作为在长尾数据集中偏向尾部类别预测的混杂因素?
  • RQ2如何在动量引入的中介与后门路径之间解耦直接因果效应 X→Y?
  • RQ3一步式去混训练加上 TDE 推断是否在跨越长尾视觉基准的表现优于现有的两阶段再平衡方法?
  • RQ4所提出的方法与在长尾设置中归一化分类器(例如 cosine)的关系,以及其潜在的合理性?

主要发现

  • 在长尾 CIFAR-10/100 和 ImageNet-LT 的多种设置下取得新的最先进结果。
  • 在 LVIS 的实例分割和目标检测方面,相较于以往赢家显示出显著提升。
  • 证明去混训练加上 TDE 推断可以超越两阶段再平衡方法,并解释其有效性。
  • 表明通过头部偏向的特征方向 D 的中介作用有助于性能,而通过 M 的后门混淆则降低尾部预测,这在 TDE 的作用下得到了缓解。
  • 提供与归一化分类器(cosine)之间的理论联系,并解释通过 Grad-CAM 对辨别区域的聚焦提升。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。