[论文解读] RGBT Salient Object Detection: A Large-scale Dataset and Benchmark
介绍 VT5000,一个规模较大的对齐 RGB-T 数据集,包含 5000 对图像用于显著性对象检测;并提出 ADFNet,一种基于注意力的多模态融合网络,在 VT5000 及两个公开数据集 VT821/VT1000 上优于先前方法。
Salient object detection in complex scenes and environments is a challenging research topic. Most works focus on RGB-based salient object detection, which limits its performance of real-life applications when confronted with adverse conditions such as dark environments and complex backgrounds. Taking advantage of RGB and thermal infrared images becomes a new research direction for detecting salient object in complex scenes recently, as thermal infrared spectrum imaging provides the complementary information and has been applied to many computer vision tasks. However, current research for RGBT salient object detection is limited by the lack of a large-scale dataset and comprehensive benchmark. This work contributes such a RGBT image dataset named VT5000, including 5000 spatially aligned RGBT image pairs with ground truth annotations. VT5000 has 11 challenges collected in different scenes and environments for exploring the robustness of algorithms. With this dataset, we propose a powerful baseline approach, which extracts multi-level features within each modality and aggregates these features of all modalities with the attention mechanism, for accurate RGBT salient object detection. Extensive experiments show that the proposed baseline approach outperforms the state-of-the-art methods on VT5000 dataset and other two public datasets. In addition, we carry out a comprehensive analysis of different algorithms of RGBT salient object detection on VT5000 dataset, and then make several valuable conclusions and provide some potential research directions for RGBT salient object detection.
研究动机与目标
- 创建一个大型、多样且可自由获取的 RGB-T 数据集(VT5000),包含真值掩码及用于挑战的11个注释。
- 提出一个端到端的 CNN 基线(ADFNet),使用 RGB 与热成像分支并进行基于注意力的融合。
- 通过边缘感知损失提升边界精度,并通过多层特征融合与全局上下文模块细化显著性。
- 分析并比较 VT5000 及公开数据集上的 RGB-T SOD 方法,为未来研究提供指导。
提出的方法
- 开发一个两流的 VGG16 基础骨干网络,分别提取 RGB 与热成像特征。
- 在融合前应用卷积块注意力模块(CBAM)对通道和空间特征进行加权。
- 在多个层次对模态特征进行融合,以保留低层和高层信息。
- 引入金字塔池化模块(PPM),在多尺度提供全局上下文引导。
- 在融合后使用特征聚合模块(FAM)整合多尺度特征。
- 使用交叉熵损失和边缘基的细化损失来锐化对象边界。

实验结果
研究问题
- RQ1要训练健壮的深度网络,RGB-T SOD 数据集应有多大、多么多样?
- RQ2基于注意力的多模态融合能否优于单模态或简单融合基线来提升 RGB-T 显著对象检测?
- RQ3多层融合与全局上下文模块是否提升了 RGB-T SOD 的定位和边界界定?
- RQ4提出方法在 VT5000 与现有的 VT821/VT1000 数据集上的对比性能如何?
主要发现
| 挑战 | PoolNet | BASNet | CPD | PFA | R3Net | RAS | PiCANet | EGNet | MTMR | SCGL | ADFNet |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BSO | 0.800 | 0.858 | 0.872 | 0.802 | 0.831 | 0.768 | 0.804 | 0.873 | 0.667 | 0.754 | 0.880 |
| CB | 0.725 | 0.808 | 0.845 | 0.748 | 0.794 | 0.669 | 0.796 | 0.838 | 0.575 | 0.703 | 0.854 |
| CIB | 0.740 | 0.822 | 0.860 | 0.742 | 0.822 | 0.688 | 0.790 | 0.854 | 0.582 | 0.694 | 0.860 |
| IC | 0.721 | 0.775 | 0.812 | 0.735 | 0.745 | 0.672 | 0.752 | 0.818 | 0.564 | 0.681 | 0.835 |
| LI | 0.757 | 0.832 | 0.840 | 0.749 | 0.790 | 0.707 | 0.783 | 0.848 | 0.695 | 0.742 | 0.868 |
| MSO | 0.706 | 0.794 | 0.826 | 0.729 | 0.774 | 0.655 | 0.777 | 0.815 | 0.620 | 0.710 | 0.837 |
| OF | 0.762 | 0.816 | 0.821 | 0.754 | 0.759 | 0.738 | 0.758 | 0.817 | 0.707 | 0.738 | 0.837 |
| SA | 0.727 | 0.762 | 0.825 | 0.726 | 0.728 | 0.673 | 0.748 | 0.791 | 0.653 | 0.665 | 0.835 |
| SSO | 0.658 | 0.718 | 0.767 | 0.695 | 0.663 | 0.535 | 0.676 | 0.701 | 0.698 | 0.753 | 0.806 |
| TC | 0.720 | 0.791 | 0.811 | 0.762 | 0.729 | 0.711 | 0.745 | 0.791 | 0.570 | 0.675 | 0.841 |
| BW | 0.750 | 0.768 | 0.795 | 0.671 | 0.753 | 0.701 | 0.773 | 0.774 | 0.606 | 0.643 | 0.804 |
| RGB | 0.733 | 0.785 | 0.804 | 0.731 | 0.736 | 0.690 | 0.743 | 0.785 | 0.670 | 0.671 | 0.817 |
| T | 0.719 | 0.787 | 0.802 | 0.755 | 0.719 | 0.699 | 0.736 | 0.776 | 0.564 | 0.664 | 0.833 |
- VT5000 提供 5000 对对齐的 RGB-T 图像及 11 个注释挑战,便于对 RGB-T SOD 方法进行健壮评估。
- 所提出的 ADFNet 在 VT5000 以及两个公开数据集(VT821 与 VT1000)上持续优于最先进方法。
- 基于 CBAM 的注意力和多层融合有效地利用 RGB 与热成像的互补线索进行显著性检测。
- PPM 和 FAM 模块分别提升全局上下文感知能力与多尺度特征整合。
- 边缘损失有助于在显著性图中产生更清晰的边界。
- 对 VT5000 的全面分析提供可操作的见解和未来 RGB-T SOD 的研究方向。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。