QUICK REVIEW

[논문 리뷰] SOLO: Segmenting Objects by Locations

Xinlong Wang, Tao Kong|arXiv (Cornell University)|2019. 12. 10.

Advanced Neural Network Applications참고 문헌 30인용 수 37

한 줄 요약

SOLO는 인스턴스 분할을 격자 셀마다 객체 중심 위치와 크기를 할당하는 두 개의 픽셀 수준 분류 작업으로 재구성하여 box-free, 원샷 마스크 예측을 가능하게 한다. Mask R-CNN과 유사한 정확도를 달성하고 이전의 단일샷 방법들을 능가한다.

ABSTRACT

We present a new, embarrassingly simple approach to instance segmentation in images. Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that have made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the 'detect-thensegment' strategy as used by Mask R-CNN, or predict category masks first then use clustering techniques to group pixels into individual instances. We view the task of instance segmentation from a completely new perspective by introducing the notion of "instance categories", which assigns categories to each pixel within an instance according to the instance's location and size, thus nicely converting instance mask segmentation into a classification-solvable problem. Now instance segmentation is decomposed into two classification tasks. We demonstrate a much simpler and flexible instance segmentation framework with strong performance, achieving on par accuracy with Mask R-CNN and outperforming recent singleshot instance segmenters in accuracy. We hope that this very simple and strong framework can serve as a baseline for many instance-level recognition tasks besides instance segmentation.

연구 동기 및 목표

개체 인스턴스를 위치와 크기로 구분하는 방식을 재고한다.
중심 위치와 피처 피라미드 레벨을 기반으로 인스턴스 카테고리를 도입한다.
후처리 없이 마스크와 클래스를 출력하는 엔드 투 엔드 단일 샷 프레임워크를 개발한다.
CoordConv를 활용해 CNN에 공간 정보를 삽입한다.
기존 방법과 비교하여 COCO에서 강력한 성능을 입증한다.

제안 방법

이미지를 S x S 격자로 나눔; 각 셀은 해당 셀에 중심이 위치한 객체의 시맨틱 카테고리와 인스턴스 마스크를 예측한다.
다른 스케일의 객체를 처리하기 위해 FPN(피처 피라미드 네트워크)을 사용해 서로 다른 피처 레벨에 할당한다.
공유 가중치를 가진 각 FPN 레벨에 두 개의 예측 헤드(카테고리 및 마스크)를 부착; 마스크는 격자 위치에 조건화된다.
CoordConv를 도입해 입력 특성에 픽셀 좌표를 연결해 공간적으로 변화하는 예측을 가능하게 한다.
손실 함수 L = L_cate + λ L_mask로 학습하며, L_mask는 마스크 최적화를 안정적으로 하는 Dice 손실을 사용한다.

실험 결과

연구 질문

RQ1인스턴스 분할을 경계 상자 제안이나 픽셀별 클러스터링 없이 단일 샷으로 직접 수행할 수 있는가?
RQ2인스턴스 위치와 객체 크기를 인스턴스 카테고리에 인코딩하면 픽셀별 마스크 예측의 정확도가 향상되는가?
RQ3격자 크기, FPN 레벨, CoordConv가 분할 정확도에 어떤 영향을 미치는가?
RQ4SOLO가 COCO에서 최첨단의 2단계 및 1단계 방법과 비교해 어떤 위치에 있는가?
RQ5SOLO의 디커플링 버전의 잠재적 효율성 이점은 무엇인가?

주요 결과

SOLO는 ResNet-101-FPN에서 37.8%의 마스크 AP를 달성하며 Mask R-CNN과 경쟁적이다(표의 37.8% 대 37.8%; 표의 조정 값에 유의).
SOLO는 이전의 단일샷 방법을 능가하고 COCO test-dev에서 2단계 방법에 근접하거나 이를 앞질러 간다.
Decoupled SOLO(X 및 Y 브랜치)는 DCN-101-FPN에서 40.5 AP를 달성하고 메모리 사용량을 줄인다.
CoordConv는 표준 합성곱에 비해 AP를 크게 향상시키며 최대 약 3.6포인트의 상승을 보인다.
Dice 손실은 테스트된 손실 함수 중 마스크 AP와 학습 안정성에서 최적의 성능을 보인다.
더 큰 격자와 다중 레벨 FPN을 사용하면 SOLO는 COCO-val에서 35.8 AP에 도달해 객체 크기에 따른 확장성을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.