QUICK REVIEW

[Paper Review] FcaNet: Frequency Channel Attention Networks

Zequn Qin, Pengyi Zhang|arXiv (Cornell University)|Dec 22, 2020

Advanced Neural Network Applications45 references37 citations

TL;DR

FcaNet broadens channel attention by compressing channels with multiple frequency components via 2D-DCT, showing that GAP is a special case of DCT and achieving state-of-the-art results on ImageNet and COCO with no extra parameters or cost.

ABSTRACT

Attention mechanism, especially channel attention, has gained great success in the computer vision field. Many works focus on how to design efficient channel attention mechanisms while ignoring a fundamental problem, i.e., channel attention mechanism uses scalar to represent channel, which is difficult due to massive information loss. In this work, we start from a different view and regard the channel representation problem as a compression process using frequency analysis. Based on the frequency analysis, we mathematically prove that the conventional global average pooling is a special case of the feature decomposition in the frequency domain. With the proof, we naturally generalize the compression of the channel attention mechanism in the frequency domain and propose our method with multi-spectral channel attention, termed as FcaNet. FcaNet is simple but effective. We can change a few lines of code in the calculation to implement our method within existing channel attention methods. Moreover, the proposed method achieves state-of-the-art results compared with other channel attention methods on image classification, object detection, and instance segmentation tasks. Our method could consistently outperform the baseline SENet, with the same number of parameters and the same computational cost. Our code and models will are publicly available at https://github.com/cfzd/FcaNet.

Motivation & Objective

Reframe channel attention as a channel compression problem.
Generalize channel attention from GAP to multiple frequency components using DCT.
Propose a multi-spectral channel attention (MSCA) framework with flexible frequency selection criteria.
Demonstrate that MSCA yields improved performance on image classification, object detection, and instance segmentation while keeping the same parameter count and compute as SENet.

Proposed method

Represent each channel by a scalar through a frequency-based compression using 2D DCT.
Show that global average pooling (GAP) corresponds to the lowest-frequency DCT component (a special case).
Split channels into parts, assign a DCT frequency component to each, and concatenate the results to form a multi-spectral compression vector (Freq).
Compute attention via a sigmoid(fc(Freq)) to reweight channels.
Propose three frequency selection criteria: LF (low frequency), TS (two-step selection), NAS (neural architecture search).
Maintain identical parameter count and negligible overhead compared to SENet by using precomputed DCT basis functions.

Experimental results

Research questions

RQ1Can channel attention be effectively reformulated as a frequency-domain compression problem?
RQ2Does incorporating multiple DCT frequency components improve channel-wise feature representations over GAP-based approaches?
RQ3How do different frequency component selection strategies (LF, TS, NAS) affect performance across vision tasks?
RQ4Can the proposed MSCA framework improve ImageNet classification and COCO detection/segmentation with the same computational budget as SENet?

Key findings

Multi-spectral channel attention (MSCA) consistently outperforms GAP-based SENet across classification and detection tasks.
Using multiple DCT frequency components yields better feature compression and higher accuracy than single-component GAP.
Low-frequency components are generally effective, but including a broader set of frequencies yields notable gains (especially with 2 or 16 components in certain setups).
Three selection schemes (LF, TS, NAS) provide flexible options for selecting frequency components, with TS offering practical Top-K based selection and NAS enabling learned component choices.
MSCA maintains the same parameter count and negligible computational overhead compared to SENet, while achieving state-of-the-art results on ImageNet and COCO benchmarks.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.