QUICK REVIEW

[Paper Review] CGNet: A Light-weight Context Guided Network for Semantic Segmentation

Tianyi Wu, Sheng Tang|arXiv (Cornell University)|Nov 20, 2018

Advanced Neural Network Applications38 references83 citations

TL;DR

CGNet introduces a lightweight Context Guided (CG) block and CGNet network, achieving competitive semantic segmentation accuracy with under 0.5M parameters (Cityscapes 64.8% mIoU) suitable for mobile deployment.

ABSTRACT

The demand of applying semantic segmentation model on mobile devices has been increasing rapidly. Current state-of-the-art networks have enormous amount of parameters hence unsuitable for mobile devices, while other small memory footprint models follow the spirit of classification network and ignore the inherent characteristic of semantic segmentation. To tackle this problem, we propose a novel Context Guided Network (CGNet), which is a light-weight and efficient network for semantic segmentation. We first propose the Context Guided (CG) block, which learns the joint feature of both local feature and surrounding context, and further improves the joint feature with the global context. Based on the CG block, we develop CGNet which captures contextual information in all stages of the network and is specially tailored for increasing segmentation accuracy. CGNet is also elaborately designed to reduce the number of parameters and save memory footprint. Under an equivalent number of parameters, the proposed CGNet significantly outperforms existing segmentation networks. Extensive experiments on Cityscapes and CamVid datasets verify the effectiveness of the proposed approach. Specifically, without any post-processing and multi-scale testing, the proposed CGNet achieves 64.8% mean IoU on Cityscapes with less than 0.5 M parameters. The source code for the complete system can be found at https://github.com/wutianyiRosun/CGNet.

Motivation & Objective

Motivate semantic segmentation on mobile devices with limited memory and computational budgets.
Design a network that preserves spatial details while exploiting local, surrounding, and global context.
Propose a light-weight backbone (CGNet) with a Context Guided (CG) block that learns joint local-surrounding-global features.
Reduce parameter count and memory footprint while maintaining high segmentation accuracy.

Proposed method

Introduce CG block consisting of local feature extractor, surrounding context extractor, joint feature extractor, and global context extractor.
Use dilated (atrous) convolutions for surrounding context and a global context pathway to reweight joint features.
Apply residual connections (local and global residual learning) to improve information flow.
Construct CGNet with three down-sampling stages (1/2, 1/4, 1/8 resolution) and channel-wise convolutions to save parameters.
Incorporate input injection to feed downsampled input into later stages to strengthen feature propagation.
Train and evaluate on Cityscapes and CamVid without post-processing or multi-scale testing; compare with small-footprint and high-accuracy models.

Experimental results

Research questions

RQ1How can semantic segmentation be made efficient for mobile devices without sacrificing accuracy?
RQ2Can a block that jointly models local features, surrounding context, and global context improve segmentation performance over traditional encoder-decoder designs?
RQ3What is the impact of using a light-weight, context-guided block across all stages on Cityscapes and CamVid datasets?

Key findings

CGNet achieves 64.8% mean IoU on Cityscapes test set with less than 0.5M parameters.
CGNet outperforms other small-footprint models at equivalent parameter counts (e.g., ENet, ESPNet) on Cityscapes.
Global context and surrounding-context components significantly boost performance; ablations show notable gains when using full surrounding context and global context modules.
CGNet uses only three down-sampling stages (1/8 resolution) and channel-wise convolutions to minimize parameters and memory usage.
On Cityscapes, CGNet_M3N21 attains 64.8% mIoU with 0.5M parameters and competitive runtime; on CamVid, it achieves 65.6% mIoU with 0.5M parameters.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.