QUICK REVIEW

[论文解读] Semantic Edge Detection with Diverse Deep Supervision

Yun Liu, Ming‐Ming Cheng|arXiv (Cornell University)|Apr 9, 2018

Advanced Image and Video Retrieval Techniques参考文献 67被引用 40

一句话总结

DDS 引入信息转换器以在单一骨干网络中实现多样化深度监督，用于语义边缘检测，在 SBD 和 Cityscapes 上达到最先进的结果。

ABSTRACT

Semantic edge detection (SED), which aims at jointly extracting edges as well as their category information, has far-reaching applications in domains such as semantic segmentation, object proposal generation, and object recognition. SED naturally requires achieving two distinct supervision targets: locating fine detailed edges and identifying high-level semantics. Our motivation comes from the hypothesis that such distinct targets prevent state-of-the-art SED methods from effectively using deep supervision to improve results. To this end, we propose a novel fully convolutional neural network using diverse deep supervision (DDS) within a multi-task framework where bottom layers aim at generating category-agnostic edges, while top layers are responsible for the detection of category-aware semantic edges. To overcome the hypothesized supervision challenge, a novel information converter unit is introduced, whose effectiveness has been extensively evaluated on SBD and Cityscapes datasets.

研究动机与目标

分析现有的 SED 方法为何因监督目标冲突而无法从深度监督中受益。
提出一种 DDS 架构，利用信息转换器间接对底层和顶部层应用不同的监督。
证明在转换器缓冲并与顶部语义边缘融合后，底部监督能改进定位。
在 SBD 和 Cityscapes 上评估 DDS，以展示最先进性能并对设计选择进行消融分析。

提出的方法

采用基于 ResNet 的骨干网络，Side-1 到 Side-4 产生二值的与类别无关的边缘图，通过信息转换器。
引入一个信息转换器单元，缓冲底层特征并使底部（类别无关）与顶部（语义）目标的两种独立损失成为可能。
在 Side-5 计算语义边缘，并通过堆叠边缘激活图和一个 K 分组的 1x1 卷积与底部边缘输出融合，产生最终的语义边缘。
使用一个多任务损失进行训练，将 L_side^(m)（m 在 1..4）与最终语义边缘图的 L_fuse 相结合，并如式 (3)-(6) 那样使用重加权交叉熵。
提供一种未加权的替代损失版本（DDS-U）以及与 SEAL 对齐的变体（DDS-R），以探索监督策略。
遵循 CASENet 启发的架构，使用 ResNet 骨干网络，采用膨胀卷积和双线性上采样，并在 COCO 上预训练后再在 SBD/Cityscapes 上微调。

实验结果

研究问题

RQ1不同的监督目标（类别无关 vs. 语义边缘）是否可以在单一骨干网络中有效分离而不引发优化冲突？
RQ2引入信息转换器缓冲是否能够为 SED 提供有益的多样化深度监督？
RQ3底部边缘在通过专门的融合机制与顶部语义边缘结合后，是否能提升语义边缘定位？
RQ4与 CASENet 和其他基线（包括消融实验）相比，DDS 在标准 SED 基准（SBD 和 Cityscapes）上的表现如何？

主要发现

aer.	bike	bird	boat	bot.	bus	car	cat	cha.	cow	tab.	dog	hor.	mot.	per.	pot.	she.	sofa	train	tv	mean
Softmax	74.0	64.1	64.8	52.5	52.1	73.2	68.1	73.2	43.1	56.2	37.3	67.4	68.4	67.6	76.7	42.7	64.3	37.5	64.6	56.3	60.2
Basic	82.5	74.2	80.2	62.3	68.0	80.8	74.3	82.9	52.9	73.1	46.1	79.6	78.9	76.0	80.4	52.4	75.4	48.6	75.8	68.0	70.6
DSN	81.6	75.6	78.4	61.3	67.6	82.3	74.6	82.6	52.4	71.9	45.9	79.2	78.3	76.2	80.1	51.9	74.9	48.0	76.5	66.8	70.3
CASENet+S4	84.1	76.4	80.7	63.7	70.3	81.3	73.4	79.4	56.9	70.7	47.6	77.5	81.0	74.5	79.9	54.5	74.8	48.3	72.6	69.4	70.9
DDS ∖ Convt	83.3	77.1	81.7	63.6	70.6	81.2	73.9	79.5	56.8	71.9	48.0	78.3	81.2	75.2	79.7	54.3	76.8	48.9	75.1	68.7	71.3
DDS ∖ Convt †	83.6	75.4	78.9	59.9	69.7	79.7	71.9	77.2	54.7	72.0	42.8	75.5	77.1	71.9	79.1	53.4	76.4	46.9	72.6	66.9	69.3
DDS ∖ DeSup	82.5	77.4	81.5	62.4	70.8	81.6	73.8	80.5	56.9	72.4	46.6	77.9	80.1	73.4	79.9	54.8	76.6	47.5	73.3	67.8	70.9
CASENet	83.3	76.0	80.7	63.4	69.2	81.3	74.9	83.2	54.3	74.8	46.4	80.3	80.2	76.6	80.8	53.3	77.2	50.1	75.9	66.8	71.4
DDS-R	85.4	78.3	83.3	65.6	71.4	83.0	75.5	81.3	59.1	75.7	50.7	80.2	82.7	77.0	81.6	58.2	79.5	50.2	76.5	71.2	73.3
DDS-U	87.2	79.7	84.7	68.3	73.0	83.7	76.7	82.3	60.4	79.4	50.9	81.2	83.6	78.3	82.0	60.1	82.7	51.2	78.0	72.7	74.8

DDS 在 SBD 上达到最先进性能，其中 DDS-U 的平均 F-measure 高于 CASENet 和 DSN 基线。
消融实验表明信息转换器和底部监督显著提升结果，DDS-R 和 DDS-U 变体优于 CASENet 及其他基线。
DDS-R 和 DDS-U 在原始协议下达到 SBD 基准的平均 F-measure 分别为 73.3 和 74.8，优于先前方法。
底部侧贡献在被信息转换器缓冲后，与顶部语义图融合后产生更平滑、精确的语义边缘。
在所有消融中，使用残差块的更简单转换器设计提供了大部分增益，验证缓冲假设比严格的架构细节更重要。
DDS 在 Cityscapes 上展现稳健提升，表明对城市场景语义边缘检测任务具有泛化性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。