QUICK REVIEW

[论文解读] SemCovNet: Towards Fair and Semantic Coverage-Aware Learning for Underrepresented Visual Concepts

Sakib Ahammed, Xia Cui|arXiv (Cornell University)|Feb 18, 2026

Domain Adaptation and Few-Shot Learning被引用 0

一句话总结

SemCovNet 引入语义描述符感知学习，通过将描述符语义与视觉特征对齐并使用 CDI 正则化来促进描述符组间的语义公平，解决语义覆盖不平衡问题。

ABSTRACT

Modern vision models increasingly rely on rich semantic representations that extend beyond class labels to include descriptive concepts and contextual attributes. However, existing datasets exhibit Semantic Coverage Imbalance (SCI), a previously overlooked bias arising from the long-tailed semantic representations. Unlike class imbalance, SCI occurs at the semantic level, affecting how models learn and reason about rare yet meaningful semantics. To mitigate SCI, we propose Semantic Coverage-Aware Network (SemCovNet), a novel model that explicitly learns to correct semantic coverage disparities. SemCovNet integrates a Semantic Descriptor Map (SDM) for learning semantic representations, a Descriptor Attention Modulation (DAM) module that dynamically weights visual and concept features, and a Descriptor-Visual Alignment (DVA) loss that aligns visual features with descriptor semantics. We quantify semantic fairness using a Coverage Disparity Index (CDI), which measures the alignment between coverage and error. Extensive experiments across multiple datasets demonstrate that SemCovNet enhances model reliability and substantially reduces CDI, achieving fairer and more equitable performance. This work establishes SCI as a measurable and correctable bias, providing a foundation for advancing semantic fairness and interpretable vision learning.

研究动机与目标

定义 Semantic Coverage Imbalance (SCI) 为类内与跨类的语义描述符表示偏差。
提出 SemCovNet，包含 Semantic Descriptor Map (SDM)、Descriptor Attention Modulation (DAM) 和 Descriptor–Visual Alignment (DVA)。
引入 Coverage Disparity Index (CDI) 作为度量与正则化项以实现语义公平。
在皮肤病学与医学影像数据集上展示 CDI 下降与可靠性提升。
显示在描述符层面的公平性在类别分布平衡时也受益。

提出的方法

构建 Semantic Descriptor Map (SDM)，将描述符先验与视觉特征融合，生成描述符专用的空间注意力图。
使用描述符令牌与图像补丁令牌之间的交叉注意力在闭环中细化描述符表示。
应用 Descriptor Attention Modulation (DAM)，通过通道与空间门控及不确定性感知调制，将描述符先验注入视觉特征。
引入 Descriptor–Visual Alignment (DVA)，使用对比损失将视觉特征与描述符嵌入对齐。
用 CDI 对训练进行正则化，使描述符覆盖与错误去相关并促进语义公平。
以联合目标函数进行训练，结合分类损失、描述符重构损失、DVA 对比损失与 CDI 正则化。

实验结果

研究问题

RQ1什么是 Semantic Coverage Imbalance (SCI)，它如何影响对不充分描述符的学习？
RQ2描述符感知的架构是否可以减少覆盖–错误错配并提升语义公平？
RQ3CDI 正则化是否在不同数据集上实现更均匀的语义覆盖组性能？
RQ4SDM/DAM/DVA 如何促进描述符–视觉对齐与模型可靠性？
RQ5在不平衡和均衡类别分布以及跨模态条件下，SemCovNet 是否鲁棒？

主要发现

SemCovNet 实现更低的 CDI，表明语义组间的覆盖–错误错配减少。
在 MILK10k 上，SemCovNet 提高 Sens.@95%Spec 与 Macro-F1，同时保持校准性（ECE）并在基线中达到最佳 CDI。
在 ISIC-DICM-17K（均衡）上，SemCovNet 仍在描述符层公平性和灵敏度方面优于基线。
消融研究表明 Hybrid_SD M 与门控融合在精度与公平性之间取得最佳折中。
当 SDM 与 DVA 联合（SDM+DVA）时，描述符定位与尾部性能显著改善，相较于仅视觉模型 Baselines。
训练过程中 CDI 正则化使 CDI 衰减至接近于零，显示出有效的公平性优化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。