QUICK REVIEW

[논문 리뷰] Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

Huiyu Wang, Yukun Zhu|arXiv (Cornell University)|2020. 03. 17.

Advanced Neural Network Applications참고 문헌 105인용 수 66

한 줄 요약

위치-민감한 axial-attention을 도입해 독립형 axial-attention 모델을 구축하고, COCO, Mapillary Vistas, Cityscapes에서 최첨단 파노픽 세그멘테이션 성능을 달성하며, 이전의 독립형 셀프-어텐션 방법 대비 강력한 효율성 향상을 보인다.

ABSTRACT

Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.

연구 동기 및 목표

전통적인 컨볼루션의 지역성 제약 없이 장거리 컨텍스트를 효율적으로 모델링하는 동기를 부여한다.
stand-alone 모델에서 큰/전역 수용 영역을 가능하게 하는 위치-민감도 axial-attention 제안.
분류 및 파노픽 세그멘테이션을 위한 백본으로 Axial-ResNet와 Axial-DeepLab을 시연한다.
향상된 효율성과 함께 COCO, Mapillary Vistas, Cityscapes에서 최첨단 성능을 보여준다.

제안 방법

2D 셀프 어텐션을 순차적인 높이 축 및 너비 축 1D 어텐션으로 분해한다(axial-attention).
쿼리-키-값 의존 위치 항(r^q, r^k, r^v)과 함께 위치-민감도 셀프 어텐션을 도입한다.
ResNet 블록의 3x3 컨볼루션을 axial-attention 계층으로 대체하여 Axial-ResNet을 구성한다.
스트라이드 조정 및 ASPP 제거를 통해 Axial-ResNet을 분할(segmentation)을 위한 Axial-DeepLab로 변환한다.
이미지넷에서 분류를 위해 학습 및 평가하고, COCO, Mapillary Vistas, Cityscapes에서 파노픽, 인스턴스 및 의미론적(segmentation) 세그멘테이션을 위해 평가한다.
span m을 사용하여 axial-attention 범위를 제어한다; input 크기로 m을 설정하면 전역 수용 영역을 허용한다; 높이와 너비에 대해 두 개의 연속 axial-attention 계층을 적용한다.

실험 결과

연구 질문

RQ1stand-alone axial-attention이 전체 2D 셀프 어텐션보다 낮은 복잡도로 글로벌 수용 영역을 달성할 수 있는가?
RQ2위치-민감도 axial-attention이 이전의 stand-alone attention 방법들보다 세그멘테이션 성능을 향상시키는가?
RQ3Axial-ResNet과 Axial-DeepLab이 파노픽, 인스턴스, 의미론적 세그멘테이션 벤치마크에서 하향식(bottom-up) 최첨단과 비교해 어떤 성능을 보이는가?
RQ4데이터셋 전반에 걸친 정확도와 효율성에 대한 axial-attention 범위와 모델 크기의 영향은 어느 정도인가?
RQ5ASPP 없이 백본 네트워크에서 전통적 컨볼루션을 axial-attention으로 교체하여 경쟁력 있는 세그멘테이션 성능을 얻는 것이 가능한가?

주요 결과

Method	Backbone	MS	Params	M-Adds	PQ	PQ Th	PQ St
DeeperLab	Xception-71		—	—	33.8	—	—
SSAP	ResNet-101	✓	—	—	36.5	—	—
Panoptic-DeepLab (Xception-71)	Xception-71		46.7M	274.0B	39.7	43.9	33.2
Panoptic-DeepLab (Xception-71)	Xception-71	✓	46.7M	3081.4B	41.2	44.9	35.7
Axial-DeepLab-S	Axial-ResNet-S		12.1M	110.4B	41.8	46.1	35.2
Axial-DeepLab-M	Axial-ResNet-M		25.9M	209.9B	42.9	47.6	35.8
Axial-DeepLab-L	Axial-ResNet-L		44.9M	343.9B	43.4	48.5	35.6
Axial-DeepLab-L	Axial-ResNet-L	✓	44.9M	3867.7B	43.9	48.6	36.8

Axial-DeepLab-L은 COCO test-dev에서 43.9 PQ를 달성하여 Panoptic-DeepLab를 2.7 PQ 앞섰다.
Single-scale Axial-DeepLab-S는 COCO val에서 DeeperLab보다 8.0 PQ 우수하고, 다중 스케일 SSAP 및 단일 스케일 Panoptic-DeepLab보다 각각 5.3와 2.1 PQ 우수하다.
Axial-DeepLab-L with MS reaches 44.2 PQ on COCO test-dev, achieving state-of-the-art among bottom-up methods and closing the gap to top-down approaches.
On Mapillary Vistas validation, Axial-DeepLab-L outperforms state-of-the-art in single-scale and multi-scale settings; with MV pretraining, Axial-DeepLab-XL attains 68.5 PQ and 44.2 AP.
Cityscapes validation shows Axial-DeepLab variants outperform ResNet-50 baselines, with larger models and MS further improving PQ and mIoU.
Across experiments, axial-attention with position-sensitivity yields consistent gains in PQ, AP, and mIoU compared to prior stand-alone attention methods.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.