[论文解读] Region Mutual Information Loss for Semantic Segmentation
RMI 损失通过建模区域级像素依赖来提升语义分割性能,在 VOC2012 和 CamVid 上取得稳定提升且无额外推理开销。
Semantic segmentation is a fundamental problem in computer vision. It is considered as a pixel-wise classification problem in practice, and most segmentation models use a pixel-wise loss as their optimization riterion. However, the pixel-wise loss ignores the dependencies between pixels in an image. Several ways to exploit the relationship between pixels have been investigated, \eg, conditional random fields (CRF) and pixel affinity based methods. Nevertheless, these methods usually require additional model branches, large extra memories, or more inference time. In this paper, we develop a region mutual information (RMI) loss to model the dependencies among pixels more simply and efficiently. In contrast to the pixel-wise loss which treats the pixels as independent samples, RMI uses one pixel and its neighbour pixels to represent this pixel. Then for each pixel in an image, we get a multi-dimensional point that encodes the relationship between pixels, and the image is cast into a multi-dimensional distribution of these high-dimensional points. The prediction and ground truth thus can achieve high order consistency through maximizing the mutual information (MI) between their multi-dimensional distributions. Moreover, as the actual value of the MI is hard to calculate, we derive a lower bound of the MI and maximize the lower bound to maximize the real value of the MI. RMI only requires a few extra computational resources in the training stage, and there is no overhead during testing. Experimental results demonstrate that RMI can achieve substantial and consistent improvements in performance on PASCAL VOC 2012 and CamVid datasets. The code is available at https://github.com/ZJULearning/RMI.
研究动机与目标
- 通过引入超越逐像素损失的像素依赖来推动分割改进。
- 提出一种基于区域的互信息损失,以在预测和真实值之间强制高阶一致性。
- 使 RMI 训练在内存开销极小且无额外推理成本的前提下高效。
- 实现无须修改基础模型即可轻松集成到现有分割框架中。
提出的方法
- 用一个像素周围的区域像素(如 3x3)表示该像素,构成一个高维点。
- 将一张图像表示为这些高维点的分布,用于预测和真实值。
- 推导互信息 I(Y;P) 的一个可行的下界并在训练过程中最大化该下界。
- 使用二阶独立性假设和闭式协方差表达式来近似后验 Y|P 的方差。
- 使用可处理的矩阵 M 和 Cholesky 分解对 MI 下界的计算进行归一化和稳定化。
- 在一个带平衡参数的联合损失中将 RMI 与常规交叉熵结合。
实验结果
研究问题
- RQ1基于区域的互信息目标是否能够在分割精度上超过像素级损失?
- RQ2如何计算对深度学习训练实用的互信息下界?
- RQ3在 RMI 中,对下采样和区域大小的权衡在性能和资源使用方面有哪些?
- RQ4RMI 是否在不同分割骨架和数据集上具有泛化性?
主要发现
- RMI 在 VOC2012 的验证集/测试集上于 DeepLabv3 和 DeepLabv3+ 基线中产生显著且一致的 mIoU 提升。
- RMI 的表现优于 CRF 后处理和亲和场损失,且无额外推理成本。
- RMI 在 CamVid 数据集上也提供显著增益,表明对数据集具有广泛适用性。
- 一种带平均池化的下采样策略和中等区域大小能有效地在性能与内存使用之间取得平衡。
- 消融研究显示较大区域大小和较小下采样系数通常会提升性能,但增加计算量
- 逐类结果显示若干类别的分割有所改进,体现了边界和细节捕获的提升。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。