QUICK REVIEW

[논문 리뷰] Hybrid Task Cascade for Instance Segmentation

Kai Chen, Jiangmiao Pang|arXiv (Cornell University)|2019. 01. 22.

Advanced Neural Network Applications참고 문헌 43인용 수 175

한 줄 요약

HTC는 다단계 캐스케이드를 통해 탐지와 분할을 교차시키고, 마스크 특징 흐름과 의미적 맥락 가지를 추가하여 COCO에서 마스크 AP를 향상시킨다.

ABSTRACT

Cascade is a classic yet powerful architecture that has boosted performance on various tasks. However, how to introduce cascade to instance segmentation remains an open question. A simple combination of Cascade R-CNN and Mask R-CNN only brings limited gain. In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation. In this work, we propose a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background. Overall, this framework can learn more discriminative features progressively while integrating complementary features together in each stage. Without bells and whistles, a single HTC obtains 38.4 and 1.5 improvement over a strong Cascade Mask R-CNN baseline on MSCOCO dataset. Moreover, our overall system achieves 48.6 mask AP on the test-challenge split, ranking 1st in the COCO 2018 Challenge Object Detection Task. Code is available at: https://github.com/open-mmlab/mmdetection.

연구 동기 및 목표

작업 간에 강한 정보 흐름을 갖는 캐스케이드를 활용해 인스턴스 분할 성능을 향상시키려는 동기 부여.
각 단계에서 탐지와 분할을 교차시키는 Hybrid Task Cascade (HTC)를 제안합니다.
마스크 정보 흐름과 의미 분기에서의 공간 맥락의 이점을 조사합니다.
COCO test-dev/test-challenge에서 엔드투엔드 학습 가능성과 최첨단 성능을 입증합니다.

제안 방법

박스 회귀와 마스크 예측이 공동 다-task 파이프라인에서 점진적으로 정교해지는 세단계 캐스케이드를 도입합니다.
단계 간 마스크 가지 사이의 직접 연결을 추가하여 마스크 정보 흐름을 가능하게 합니다.
공간 맥락을 제공하고 박스/마스크 가지와 특징을 융합하기 위해 완전 합성곱(fully convolutional) 의미 분할 가지를 도입합니다.
RoIAlign를 통해 의미 특성을 ROI 특징과 융합하여 박스 및 마스크 예측을 향상시킵니다.
단계와 작업에 걸친 다중 작업 손실로 학습하며, 균형 계수 alpha_t와 beta를 사용합니다.
추가 이득을 위해 백본 및 학습 트릭(DCN, SyncBN, 멀티스케일, 앙상블)로 확장하는 것도 가능합니다.

실험 결과

연구 질문

RQ1 cascade형 다중 작업 아키텍처가 인스턴스 분할에서 경계상자(box) 예측과 마스크 예측을 모두 개선할 수 있는가?
RQ2단계 간 마스크 정보 흐름이 마스크 정제로 실제로 향상을 주는가?
RQ3의미 분할 가지를 추가하는 것이 전경-배경 구분을 개선하는가?
RQ4이 설계 선택들이 COCO 마스크 AP 및 test-dev/test-challenge에서의 전반적 성능에 어떤 영향을 미치는가?

주요 결과

방법	백본	박스 AP	마스크 AP	AP50	AP75	AP_S	AP_M	AP_L	실행 속도 (fps)
Mask R-CNN	ResNet-50-FPN	39.1	35.6	57.6	38.1	18.7	38.3	46.6	5.3
Cascade Mask R-CNN	ResNet-50-FPN	42.7	36.9	58.6	39.7	19.6	39.3	48.8	3.0
HTC (ours)	ResNet-50-FPN	43.6	38.4	60.0	41.5	20.4	40.7	51.2	2.5
HTC (ours)	ResNet-101-FPN	45.3	39.7	61.8	43.1	21.0	42.2	53.5	2.4
HTC (ours)	ResNeXt-101-FPN	47.1	41.2	63.9	44.7	22.8	43.9	54.6	2.1

HTC는 백본 전반에 걸쳐 Mask R-CNN 및 Cascade Mask R-CNN 기본값보다 더 높은 마스크 AP를 산출합니다.
ResNet-50-FPN, ResNet-101-FPN, ResNeXt-101-FPN 조합의 HTC는 일관되게 베이스라인 대비 마스크 AP를 최대 약 1.5포인트까지 향상시킵니다.
교차 실행은 소폭 이득을 제공하고; 마스크 정보 흐름은 추가 향상을 제공합니다(약 0.6–1.5 AP).
의미 분할 가지는 보완적 맥락을 제공하여 추가 이득을 가져오고(약 0.6 AP).
COCO test-dev에서 강력한 백본과 부가 기능을 갖춘 HTC는 49.0의 마스크 AP를 달성하고, test-challenge에서 48.6의 마스크 AP를 달성합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.