[论文解读] Large-Scale Long-Tailed Recognition in an Open World
提出了 Open Long-Tailed Recognition (OLTR),在一个统一框架中结合动态元嵌入和调制注意力,以处理头部/尾部共享、少样本泛化和开放集新颖性,并在大规模开放长尾基准上进行验证。
Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum. The key challenges are how to share visual knowledge between head and tail classes and how to reduce confusion between tail and open classes. We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. Our so-called dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating the familiarity to known classes. On three large-scale OLTR datasets we curate from object-centric ImageNet, scene-centric Places, and face-centric MS1M data, our method consistently outperforms the state-of-the-art. Our code, datasets, and models enable future OLTR research and are publicly available at https://liuziwei7.github.io/projects/LongTail.html.
研究动机与目标
- 定义将长尾分布与开放集新颖性相结合并在头部、尾部和开放类别之间实现平衡评估的 OLTR 任务。
- 开发一个将头部与尾部知识共享的集成模型,通过动态元嵌入实现,并通过归一化嵌入范数区分尾部与开放类别。
- 整理大型 OLTR 基准(ImageNet-LT、Places-LT、MS1M-LT),并展示优于最先进基线的性能。
- 提供端到端可学习组件(基于记忆的传输、概念选择器、可达性校准和调制注意力),并且可扩展到大规模数据集。
提出的方法
- 动态元嵌入将直接的图像特征与在视觉记忆 M 中学习到的判别质心所代表的记忆特征相结合。
- 记忆特征 v^{memory} 由 v^{memory}=o^{T}M 形成,其中 o=T_hal(v^{direct}),M 包含类别质心;概念选择器 e=T_sel(v^{direct}) 调节记忆的贡献。
- 元嵌入为 v^{meta} = (1/γ) * (v^{direct} + e ⊗ v^{memory}),其中 γ 是可达性(到记忆质心的最小距离)。
- 可达性校准使用 γ 将开放集实例(到记忆的距离较大)与已知类别(距离较小)区分开。
- 调制注意力(MA)在自注意力图上应用条件空间注意,以通过上下文选择来促进头尾判别,形式为 f^{att} = f + MA(f) ⊗ SA(f)。
- 使用带归一化的元嵌入和权重的余弦分类器,并对 v^{meta} 进行非线性压缩以稳定尺度。
- 损失由交叉熵与对记忆质心的大-margin项相结合组成:L = Σ L_CE(v^{meta}, y) + λ L_LM(v^{meta}, {c_i})。
实验结果
研究问题
- RQ1如何在一个统一框架中对头部、尾部和开放类别进行识别?
- RQ2在头部和尾部之间共享可视知识是否能提升尾部鲁棒性,而不损害头部的准确性?
- RQ3开放集的新颖性是否能在学得的特征空间中被检测和校准,而不仅仅在分类器输出处?
- RQ4所提出的组件(动态元嵌入、记忆、可达性校准和调制注意力)是否能泛化到跨领域的大规模真实世界长尾数据(图像、场景、人脸)?
主要发现
- OLTR with dynamic meta-embedding consistently improves over state-of-the-art baselines on large open-long-tailed benchmarks (ImageNet-LT, Places-LT, MS1M-LT).
- The memory feature and concept selector substantially boost tail-class performance, especially in medium-shot and few-shot regimes.
- Reachability-based calibration enhances open-set discrimination, particularly for few-shot and one-shot identities, while preserving many-shot performance.
- Modulated attention improves discrimination between head and tail by guiding models to use different spatial contexts across classes.
- Experiments on MegaFace and SUN-LT show strong generalization to face and scene datasets, with notable gains in low-shot and zero-shot identities.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。