QUICK REVIEW

[论文解读] Delving into Deep Imbalanced Regression

Yuzhe Yang, Kaiwen Zha|arXiv (Cornell University)|Feb 18, 2021

Imbalanced Data Classification Techniques参考文献 38被引用 120

一句话总结

DIR 通过平滑标签和特征分布来学习来自不平衡的连续目标；引入 Label Distribution Smoothing (LDS) 和 Feature Distribution Smoothing (FDS)，并在视觉、NLP 和医疗数据集上进行基准测试。

ABSTRACT

Real-world data often exhibit imbalanced distributions, where certain target values have significantly fewer observations. Existing techniques for dealing with imbalanced data focus on targets with categorical indices, i.e., different classes. However, many tasks involve continuous targets, where hard boundaries between classes do not exist. We define Deep Imbalanced Regression (DIR) as learning from such imbalanced data with continuous targets, dealing with potential missing data for certain target values, and generalizing to the entire target range. Motivated by the intrinsic difference between categorical and continuous label space, we propose distribution smoothing for both labels and features, which explicitly acknowledges the effects of nearby targets, and calibrates both label and learned feature distributions. We curate and benchmark large-scale DIR datasets from common real-world tasks in computer vision, natural language processing, and healthcare domains. Extensive experiments verify the superior performance of our strategies. Our work fills the gap in benchmarks and techniques for practical imbalanced regression problems. Code and data are available at https://github.com/YyzHarry/imbalanced-regression.

研究动机与目标

定义 Deep Imbalanced Regression (DIR)，并阐述在不平衡条件下对连续目标的挑战。
提出两种基于平滑的技术（LDS 和 FDS）以校准标签分布和特征分布。
在视觉、NLP 与医疗领域整理大规模 DIR 基准以实现稳健评估。
展示在跨任务中将 LDS/FDS 与现有基线结合时的持续改进。

提出的方法

用连续目标区间和缺失区域正式定义 DIR。
Label Distribution Smoothing (LDS)：对经验标签密度进行核平滑，以获得用于重新加权损失的有效标签密度。
Feature Distribution Smoothing (FDS)：在跨目标区间对特征统计量（均值和协方差）进行核平滑，并应用白化/重新着色以校准特征。
将 LDS 和 FDS 集成到端到端的深度学习模型中，并采用基于动量的运行统计。
在 IMDB-WIKI-DIR、AgeDB-DIR、STS-B-DIR、NYUD2-DIR、SHHS-DIR 上以不同架构对 DIR 数据集进行基准测试。
评估基线，包括原生训练、SMOTER/SMOGN 变体以及重加权方案。

实验结果

研究问题

RQ1与传统分类不平衡相比，连续目标不平衡如何影响学习？
RQ2LDS 和 FDS 是否在 many-shot、medium-shot、few-shot 和 zero-shot 区域提升回归性能？
RQ3DIR 方法是否能够外推/内插到几乎没有训练数据的目标区域？
RQ4LDS 和 FDS 如何与在不同任务和模态上的现有不平衡回归基线相互作用？

主要发现

LDS 和 FDS 在视觉、NLP 和医疗等五个真实世界数据集上持续提升 DIR 的性能。
将 LDS 和 FDS 结合可实现最强的增益，尤其在 medium-shot 和 few-shot 区域以及外推/内插方面。
从不平衡分类中改编的基线方法（如重加权、SMOTE 变体）在高维连续目标上往往表现不佳，而 LDS/FDS 提供稳健的增益。
DIR 基准测试揭示了回归与分类在不平衡学习中的不同表现，证明了对目标平滑方法的必要性。
实验结果表明在使用 LDS 和 FDS 时对 zero-shot 区域的泛化能力有所提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。