QUICK REVIEW

[논문 리뷰] One-Shot Object Detection with Co-Attention and Co-Excitation

Ting-I Hsieh, Yi‐Chen Lo|arXiv (Cornell University)|2019. 11. 28.

Advanced Neural Network Applications인용 수 116

한 줄 요약

한 두 문장으로 직접 답하는 요약이 여기에 들어갑니다.

ABSTRACT

This paper aims to tackle the challenging problem of one-shot object detection. Given a query image patch whose class label is not included in the training data, the goal of the task is to detect all instances of the same class in a target image. To this end, we develop a novel {\em co-attention and co-excitation} (CoAE) framework that makes contributions in three key technical aspects. First, we propose to use the non-local operation to explore the co-attention embodied in each query-target pair and yield region proposals accounting for the one-shot situation. Second, we formulate a squeeze-and-co-excitation scheme that can adaptively emphasize correlated feature channels to help uncover relevant proposals and eventually the target objects. Third, we design a margin-based ranking loss for implicitly learning a metric to predict the similarity of a region proposal to the underlying query, no matter its class label is seen or unseen in training. The resulting model is therefore a two-stage detector that yields a strong baseline on both VOC and MS-COCO under one-shot setting of detecting objects from both seen and never-seen classes. Codes are available at https://github.com/timy90022/One-Shot-Object-Detection.

연구 동기 및 목표

타깃 이미지에서 unseen 클래스의 쿼리 패치가 주어졌을 때 unseen 클래스 객체를 탐지하는 도전 과제를 해결한다.
쿼리-타깃 정보를 결합해 원샷 탐지를 위한 더 나은 영역 제안(proposals)을 생성한다.
테스트 시 unseen 클래스 라벨이 필요 없이 제안과 쿼리의 유사도에 의해 순위를 매기는 메트릭 학습 스타일의 메커니즘을 개발한다.
학습 없이도 클래스에 독립적인 unseen 객체 탐지를 가능하게 하는 강건한 유사도 메트릭을 학습한다.

제안 방법

비지역 상호 주의(attention)를 이용해 쿼리와 타깃 이미지를 결합한 비지역 영역 제안을 생성하도록 Faster R-CNN을 확장한다(공동 주의, co-attention).
쿼리와 타깃의 특징 채널을 적응적으로 재가중하기 위해 squeeze-and-co-excitation(SCE)을 도입하여 더 나은 매칭(GAP, SE 블록과 유사한 두 개의 FC/MLP 계층)을 제공한다.
공영 excitation 이후 F(p)로부터 쿼리 특징 q를, F(I)로부터 영역 특징 r을 계산하고, 두 계층의 MLP와 마진 기반 순위 손실로 제안과 쿼리 간의 유사도 메트릭을 학습한다.
IoU>0.5를 기반으로 하는 전경/배경 라벨링으로 학습하고 Faster R-CNN의 L_CE 및 L_Reg 손실과 함께 마진 기반 순위 손실 L_MR(m^+ = 0.7, m^- = 0.3)을 사용한다.
백본 초기화에서 COCO/VOC 클래스를 보지 않도록 축소된 ImageNet 사전 학습(725 클래스) 사용하고, 전체 1000클래스 사전 학습과 비교한다.

실험 결과

연구 질문

RQ1쿼리 패치와 타깃 이미지 간의 공동 주의가 원샷 탐지를 위한 영역 제안 품질을 향상시킬 수 있는가?
RQ2 squeeze-and-co-excitation 메커니즘이 unseen 클래스 탐지를 위한 상관 채널을 강조하는 데 도움이 되는가?
RQ3마진 기반 순위 손실이 제안과 쿼리 간의Robust한 유사도 메트릭을 seen 및 unseen 클래스 모두에 대해 암묵적으로 학습시킬 수 있는가?

주요 결과

CoAE 프레임워크는 VOC 및 COCO 데이터셋에서 원샷 탐지에 강력한 베이스라인을 제시한다.
비지역(co-attention)과 SCE(co-excitation) 구성요소 모두가 성능을 크게 향상시키며, 함께 사용할 때 누적 이득이 나타난다.
마진 기반 순위 손실은 제안의 바람직한 순위를 학습시켜 추가적으로, 다소 보통의 개선을 제공한다.
COCO에서 Ours (1k)는 baseline SiamMask 대비 AP50 향상을 달성하여 unseen 클래스에 대한 일반화가 좋음을 시사한다.
비지역 제안이 쿼리에 의해 영향을 받는 대상 영역에 초점을 맞추고, co-excitation은 의미 있는 클래스별 가중치 분포를 드러낸다는 시각화를 확인했다(예: 동물군 vs. 차량군).
이 방법은 unseen-class 탐지에서 여전히 견고하며 VOC 및 COCO의 기준선 대비 개선된 결과를 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.