QUICK REVIEW

[論文レビュー] Stacked Deconvolutional Network for Semantic Segmentation

Jun Fu, Jing Liu|arXiv (Cornell University)|Aug 16, 2017

Advanced Neural Network Applications参考文献 34被引用数 83

ひとこと要約

本論文は Stacked Deconvolutional Network (SDN) を提案し、浅いデコンボリューション単位を密な内部・単位間接続と階層的監督で積み重ね、CRF後処理なしで最先端のセマンティックセグメンテーションを実現する。

ABSTRACT

Recent progress in semantic segmentation has been driven by improving the spatial resolution under Fully Convolutional Networks (FCNs). To address this problem, we propose a Stacked Deconvolutional Network (SDN) for semantic segmentation. In SDN, multiple shallow deconvolutional networks, which are called as SDN units, are stacked one by one to integrate contextual information and guarantee the fine recovery of localization information. Meanwhile, inter-unit and intra-unit connections are designed to assist network training and enhance feature fusion since the connections improve the flow of information and gradient propagation throughout the network. Besides, hierarchical supervision is applied during the upsampling process of each SDN unit, which guarantees the discrimination of feature representations and benefits the network optimization. We carry out comprehensive experiments and achieve the new state-of-the-art results on three datasets, including PASCAL VOC 2012, CamVid, GATECH. In particular, our best model without CRF post-processing achieves an intersection-over-union score of 86.6% in the test set.

研究の動機と目的

Motivate improving spatial resolution and boundary delineation in semantic segmentation under FCN frameworks.
Propose a scalable, trainable architecture by stacking shallow deconvolutional units to capture multi-scale context.
Facilitate optimization via intra-unit and inter-unit dense connections and hierarchical supervision.
Demonstrate state-of-the-art performance on PASCAL VOC 2012, CamVid, and GATECH datasets.
Show that the best model achieves high Mean IoU without CRF post-processing.

提案手法

Introduce SDN units: encoder-decoder blocks with downsampling and upsampling paths.
Use DenseNet-inspired dense connections within downsampling blocks to encourage feature reuse.
Incorporate intra-unit dense connections and inter-unit skip connections to improve gradient flow and multi-scale feature fusion.
Apply hierarchical supervision at multiple upsampling stages to strengthen discrimination and optimization.
Fuse score maps across units and scales to enhance boundary localization during upsampling.
Leverage pre-trained DenseNet-161 as the first encoder, with subsequent units built from downsampling/upsampling blocks and compressions.
Train with data augmentation, poly learning rate policy, and end-to-end optimization; test uses highest-resolution last-unit output.

実験結果

リサーチクエスチョン

RQ1Can stacking shallow deconvolutional units with dense intra-/inter-unit connections improve boundary localization and segmentation accuracy over single deep deconvolutional networks?
RQ2Does hierarchical supervision at multiple upsampling stages and score-map fusion lead to better optimization and discrimination of pixel-wise predictions?
RQ3What is the impact of pretraining on a large classifier network (DenseNet-161) and staged upsampling on segmentation performance across standard benchmarks?
RQ4How does SDN perform on PASCAL VOC 2012, CamVid, and GATECH compared to state-of-the-art methods?
RQ5What is the effect of varying the number of stacked units and supervision configurations on Mean IoU?

主な発見

モデル	深さ	パラメータ (M)	平均 IoU (%)
SDN_M1	169	84.9	78.2
SDN_M2	185	161.7	79.2
SDN_M3	201	238.5	79.9
SDN_M1+	185	161.7	78.6

SDN with three stacked units achieves higher Mean IoU than fewer units on PASCAL VOC 2012 validation.
Hierarchical supervision and intra-/inter-unit dense connections improve training stability and boundary detail, contributing to performance gains.
Score-map fusion and additional upsampling blocks yield measurable improvements in Mean IoU.
Pretraining SDN-M2* on MS-COCO further boosts performance, surpassing notable baselines (e.g., Deeplabv3) in Mean IoU.
The SDN family achieves state-of-the-art results on PASCAL VOC 2012, CamVid, and GATECH benchmarks, with notable gains without CRF post-processing.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。