[论文解读] CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection
CoADNet 引入一个端到端框架,通过在线内部显著性指导来建模跨图像关系以实现 Co-SOD,通过一个两阶段聚合与分发结构(分组注意的语义聚合与门控分组分发),以及一个保持组内一致性的解码器来生成准确、一致的共显性图。
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images. One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships. In this paper, we present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images. First, we integrate saliency priors into the backbone features to suppress the redundant background information through an online intra-saliency guidance structure. After that, we design a two-stage aggregate-and-distribute architecture to explore group-wise semantic interactions and produce the co-saliency features. In the first stage, we propose a group-attentional semantic aggregation module that models inter-image relationships to generate the group-wise semantic representations. In the second stage, we propose a gated group distribution module that adaptively distributes the learned group semantics to different individuals in a dynamic gating mechanism. Finally, we develop a group consistency preserving decoder tailored for the CoSOD task, which maintains group constraints during feature decoding to predict more consistent full-resolution co-saliency maps. The proposed CoADNet is evaluated on four prevailing CoSOD benchmark datasets, which demonstrates the remarkable performance improvement over ten state-of-the-art competitors.
研究动机与目标
- Motivate and address the challenge of modeling inter-image relationships in Co-SOD.
- Propose an online intra-saliency guidance module to inject learnable saliency priors.
- Design a two-stage aggregate-and-distribute architecture to capture group semantics and distribute them adaptively.
- Introduce a group consistency preserving decoder to maintain inter-image consistency in full-resolution maps.
- Demonstrate state-of-the-art performance on multiple CoSOD benchmarks with ablations.
提出的方法
- Introduce Online Intra-Saliency Guidance (OIaSG) to fuse saliency priors with backbone features via an IaSH and a trainable fusion mechanism.
- Develop Group-Attentional Semantic Aggregation (GASA) to build order-insensitive, long-range inter-image relationships using block-wise channel shuffling, atrous multi-scale context, and self-attention-based global dependencies.
- Propose Gated Group Distribution (GGD) to dynamically combine intra-image features with group semantics through a gating mechanism guided by a squeeze-and-excitation-based estimator.
- Implement a Group Consistency Preserving Decoder (GCPD) with cascaded decoding units that preserve inter-image constraints during upsampling to produce full-resolution co-saliency maps.
- Train end-to-end with a multi-task loss that combines co-saliency and single-image saliency supervision.
实验结果
研究问题
- RQ1How can inter-image relationships be modeled in a way that is robust to image order and spatial variation?
- RQ2Can saliency priors be learned online to guide CoSOD without explicit category labels?
- RQ3Does a two-stage aggregation-and-distribution pipeline improve co-saliency feature learning compared to prior approaches?
- RQ4Will a decoder that preserves group consistency yield more coherent multi-image co-saliency maps?
- RQ5How does CoADNet perform against state-of-the-art methods across standard CoSOD benchmarks?
主要发现
- CoADNet variants achieve state-of-the-art results across Cosal2015, CoSOD3k, MSRC, and iCoseg datasets with backbones VGG16, ResNet-50, and Dilated ResNet-50.
- Ablation studies show significant gains from OIaSG, GASA, GGD, and GCPD components, with cumulative improvements in F-measure, MAE, and S-measure.
- CoADNet-DR (with Dilated ResNet-50) attains top performance, e.g., F-measure 0.8874, MAE 0.0599, S-measure 0.8705 on Cosal2015; similarly strong gains on other datasets.
- The method consistently improves co-saliency localization and background suppression, and maintains inter-image consistency through decoding.
- CoADNet-V, -R, and -DR variants show competitive parameter counts (around 120 MB), suggesting strong performance with comparable model capacity.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。