QUICK REVIEW

[论文解读] ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation

Xinxin Hu, Kailun Yang|arXiv (Cornell University)|May 24, 2019

Advanced Neural Network Applications参考文献 17被引用 25

一句话总结

ACNet 提出了一种多分支注意力网络，通过基于通道注意力的注意力互补模块（ACM）选择性地融合 RGB 和深度特征，实现动态、上下文感知的特征聚合。该方法在使用 ResNet-50 的 NYUDv2 数据集上实现了 48.3% 的 mIoU，达到当前最先进性能，优于先前方法 0.6 个百分点。

ABSTRACT

Compared to RGB semantic segmentation, RGBD semantic segmentation can achieve better performance by taking depth information into consideration. However, it is still problematic for contemporary segmenters to effectively exploit RGBD information since the feature distributions of RGB and depth (D) images vary significantly in different scenes. In this paper, we propose an Attention Complementary Network (ACNet) that selectively gathers features from RGB and depth branches. The main contributions lie in the Attention Complementary Module (ACM) and the architecture with three parallel branches. More precisely, ACM is a channel attention-based module that extracts weighted features from RGB and depth branches. The architecture preserves the inference of the original RGB and depth branches, and enables the fusion branch at the same time. Based on the above structures, ACNet is capable of exploiting more high-quality features from different channels. We evaluate our model on SUN-RGBD and NYUDv2 datasets, and prove that our model outperforms state-of-the-art methods. In particular, a mIoU score of 48.3\% on NYUDv2 test set is achieved with ResNet50. We will release our source code based on PyTorch and the trained segmentation model at https://github.com/anheidelonghu/ACNet.

研究动机与目标

解决室内场景中 RGB 和深度特征之间信息分布不均且不一致的挑战。
克服现有 RGBD 分割网络在特征融合中存在过度融合或融合不足的问题，避免破坏原始分支的表示。
设计一种机制，基于通道注意力自适应地选择并融合来自 RGB 和深度分支的最相关信息特征。
在保持 RGB 和深度独立推理路径的同时，通过多分支架构实现有效融合。
通过利用两种模态的互补信息，提升在标准 RGBD 基准测试上的分割精度。

提出的方法

采用三分支架构，其中两个独立的 ResNet 编码器分别处理 RGB 和深度输入，第三个分支用于融合后的特征。
引入注意力互补模块（ACM），通过全局平均池化后接 1×1 卷积和 Sigmoid 激活函数，计算通道级注意力权重。
利用计算出的注意力权重对输入特征图进行通道级逐元素乘法（外积），突出显示有信息量的通道。
通过逐元素加法将加权后的 RGB 和深度特征融合到融合分支，实现上下文感知的动态特征整合。
在整个网络中保持 RGB 和深度的独立特征流，避免早期融合导致的信息损失。
在 NYUDv2 和 SUN-RGBD 数据集上使用交叉熵损失、学习率调度和数据增强进行网络训练。

实验结果

研究问题

RQ1当 RGB 和深度特征的信息量和分布因场景而异时，如何实现选择性融合？
RQ2可学习的注意力机制是否能有效识别并优先选择不同网络深度下 RGB 和深度分支中的更有信息量的特征？
RQ3在保持 RGB 和深度独立推理路径的同时，通过晚期融合是否能提升分割性能，相比早期或中间层融合？
RQ4所提出的 ACM 在多大程度上减少了特征图中通道间的冗余并实现了特征分布的均质化？
RQ5所提出的架构是否能在标准 RGBD 基准测试上，使用轻量级主干网络（如 ResNet-50）实现最先进性能？

主要发现

ACNet 在使用 ResNet-50 的 NYUDv2 测试集上实现了 48.3% 的 mIoU，创下新的最先进性能，优于先前方法 0.6 个百分点。
在 SUN-RGBD 数据集上，ACNet 使用 ResNet-50 达到了 48.1% 的 mIoU，性能与 CFN（RefineNet-152）相当，但使用了更轻量的主干网络。
消融实验表明，若移除 ACM，mIoU 会下降至 44.3%（Model-1），证明注意力驱动的特征选择对性能至关重要。
多分支架构贡献显著，即使仅移除 ACM 而保留架构，mIoU 仍下降至 46.8%，表明 ACM 本身贡献了 1.5% 的性能增益。
可视化与权重分析显示，低层（Conv 和 Layer1）中 RGB 特征占主导，而高层（Layer2–4）中深度特征变得更具信息量，证实了动态模态选择机制的有效性。
注意力权重的标准差从 Conv 层到 Layer3 逐渐减小，表明特征分布趋于均质化，但在 Layer4 有所增加，反映出对冗余特征的有选择性地剔除。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。