QUICK REVIEW

[논문 리뷰] CEDNet: A Cascade Encoder-Decoder Network for Dense Prediction

Gang Zhang, Ziyi Li|arXiv (Cornell University)|2023. 02. 13.

Advanced Neural Network Applications인용 수 16

한 줄 요약

CEDNet은 각 스테이지 내에서 다중 스케일 특징을 융합하는 캐스케이드 인코더-디코더 스테이지를 도입하여 고수준 신호에 의해 조기에 특징 융합을 가능하게 하고, 탐지, 분할 및 인스턴스 분할에서 강력한 이점을 달성한다.

ABSTRACT

Multi-scale features are essential for dense prediction tasks, such as object detection, instance segmentation, and semantic segmentation. The prevailing methods usually utilize a classification backbone to extract multi-scale features and then fuse these features using a lightweight module (e.g., the fusion module in FPN and BiFPN, two typical object detection methods). However, as these methods allocate most computational resources to the classification backbone, the multi-scale feature fusion in these methods is delayed, which may lead to inadequate feature fusion. While some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder-decoder network, dubbed CEDNet, tailored for dense \mbox{prediction} tasks. All stages in CEDNet share the same encoder-decoder structure and perform multi-scale feature fusion within the decoder. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder-decoder structures: Hourglass, UNet, and FPN. When integrated into CEDNet, they performed much better than traditional methods that use a pre-designed classification backbone combined with a lightweight fusion module. Extensive experiments on object detection, instance segmentation, and semantic segmentation demonstrated the effectiveness of our method. The code is available at https://github.com/zhanggang001/CEDNet.

연구 동기 및 목표

전통적인 백본을 넘는 밀집 예측 작업을 위한 다중 스케일 특징 융합 개선 동기를 제시한다.
모든 스테이지가 통합된 인코더-디코더 구조를 공유하는 캐스케이드 인코더-디코더 아키텍처를 제안한다.
고수준 Features의 조기 융합을 통해 낮은 수준 특징 학습을 여러 스테이지에 걸쳐 안내한다.
여러 가지 인코더-디코더 구현을 평가하고 일반적인 FPN 기반 백본 대비 다양한 작업에서 이득을 보여준다.

제안 방법

다단계 캐스케이드 네트워크(CEDNet)로 stem 다음에 m개의 캐스케이드 스테이지를 두고 각 스테이지에서 다중 스케일 특징 융합을 수행한다.
세 가지 인코더-디코더 스타일(Hourglass, UNet, FPN)을 채택하고 모두 우수하다는 것을 보여주며, 추가 분석을 위해 기본적으로 FPN 스타일을 선택한다.
핵심 빌딩 블록으로 CED 블록(token mixer + 채널 상호작용을 위한 MLP)을 사용하고, 장거리 맥락을 위한 7x7 확장된 심층 합성 LR CED 블록을 선택적으로 도입한다.
초기 스테이지의 고수준 특징을 공유 인코더-디코더 구조로 구성하여 이후 스테이지에서의 낮은 수준 특징 학습을 가이드한다.
채널 차원, 블록 수, 스테이지 수 등에서 차이가 있는 변형(CEDNet-NeXt-T/S/B)을 실험한다.
COCO에서 객체 탐지/인스턴스 분할 및 ADE20K에서 의미 분할에 대해 광범위한 미세 조정을 수행한다.

실험 결과

연구 질문

RQ1다중 스케일 특징의 조기 융합을 갖춘 캐스케이드 인코더-디코더 설계가 경량 융합 모듈을 갖춘 전통적 백본보다 밀집 예측 작업에서 더 나은 성능을 보일 수 있는가?
RQ2어떤 인코더-디코더 스타일(Hourglass, UNet, 또는 FPN)이 CEDNet 내에서 정확도와 속도 사이의 최적의 균형을 제공하는가?
RQ3LR CED 블록을 도입하면 성능이 비용을 거의 증가시키지 않으면서 개선되는가?
RQ4조기 융합 타이밍이 여러 스테이지에 걸친 탐지 성능에 어떤 영향을 미치는가?
RQ5다른 토큰 믹서(DW 합성, 윈도우 어텐션 등)가 CEDNet의 다양한 작업에서 이익에 영향을 미치는가?

주요 결과

CEDNet 변형은 COCO 물체 탐지/인스턴스 분할에서 FPN/NAS-FPN/BiFPN 대비 ConvNeXt 기반 백본보다 뚜렷한 격차로 우수한 성능을 보인다.
COCO val2017에서 CEDNet-NeXt-T는 APb 49.3, AP50 69.1, AP75 53.7를 달성하고, CEDNet-NeXt-S는 APb 50.3, AP50 70.2, AP75 55.2를 달성한다.
CEDNet 변형은 또한 ADE20K에서 다중 스케일 테스트에서 ConvNeXt 베이스라인 대비 mIoU가 0.8–2.2 포인트 증가한다.
COCO에서 CEDNet-NeXt-T는 탐지기(Deformable DETR, RetinaNet, Mask R-CNN, Cascade Mask R-CNN)에 따라 상자 AP를 2.2–2.9 포인트, 마스크 AP를 1.7–2.8 포인트 향상시키고; CEDNet-NeXt-S도 이익을 유지한다.
실험에서 조기 융합 타이밍이 더 나은 AP를 제공함을 보여주고, LR CED 블록은 매개변수 비용이 거의 없으면서 추가적으로 약 0.4 포인트의 박스 AP 이득을 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.