QUICK REVIEW

[논문 리뷰] See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification

Tao Hu, Honggang Qi|arXiv (Cornell University)|2019. 01. 26.

Advanced Neural Network Applications참고 문헌 41인용 수 100

한 줄 요약

본 논문은 WS-DAN을 제시한다, 약하게 지도된 주의(attention) 기반 데이터 증강 프레임워크로 주의 영역을 잘라내고 제거하여 미세한 구별이 필요한 시각적 분류를 향상시키고, 다수의 FGVC 데이터셋에서 최첨단 성과를 달성한다.

ABSTRACT

Data augmentation is usually adopted to increase the amount of training data, prevent overfitting and improve the performance of deep models. However, in practice, random data augmentation, such as random image cropping, is low-efficiency and might introduce many uncontrolled background noises. In this paper, we propose Weakly Supervised Data Augmentation Network (WS-DAN) to explore the potential of data augmentation. Specifically, for each training image, we first generate attention maps to represent the object's discriminative parts by weakly supervised learning. Next, we augment the image guided by these attention maps, including attention cropping and attention dropping. The proposed WS-DAN improves the classification accuracy in two folds. In the first stage, images can be seen better since more discriminative parts' features will be extracted. In the second stage, attention regions provide accurate location of object, which ensures our model to look at the object closer and further improve the performance. Comprehensive experiments in common fine-grained visual classification datasets show that our WS-DAN surpasses the state-of-the-art methods, which demonstrates its effectiveness.

연구 동기 및 목표

객체의 공간 구조를 존중하는 데이터 증강으로 FGVC 성능을 향상시키려는 동기를 제시한다(무작위 자르기 대신).
이미지 수준 주석을 이용해 판별 가능한 객체 부분을 찾는 약하게 지도된 주의 학습 모듈을 제안한다.
로컬 특징 추출을 강화하기 위한 주의 가이드 데이터 증강(attention cropping 및 attention dropping)을 도입한다.
거친 추정에서 세밀한 예측으로의 정확한 객체 위치 추정과 정교화를 가능하게 한다.

제안 방법

약한 지도 학습을 통해 객체 부분을 나타내는 다중 주의 맵을 생성하고 특징 맵을 추출한다.
부분 특징들을 강건한 로컬 특징 표현으로 융합하기 위해 Bilinear Attention Pooling을 적용한다.
같은 부품의 여러 인스턴스 간 파트 표현을 안정화하기 위해 주의 규제 손실(attention regularization loss)을 부과한다.
높은 주의 영역 주변을 자르고 주의 영역을 제거하는 주의 가이드 데이터 증강(attention-guided data augmentation)을 수행하여 여러 부분의 탐색을 강제한다.
테스트 중에는 원시 이미지로 거친 예측을 계산하고, 주의 맵을 통해 객체를 로컬라이즈한 다음 객체 영역을 확대하고 거친·세밀 예측을 융합한다.

실험 결과

연구 질문

RQ1부분 수준 주석이 없는 상태에서 약하게 지도된 주의 맵이 FGVC를 위해 판별 가능한 객체 부분을 정확히 로컬라이즈할 수 있는가?
RQ2주 의 가이드 증강(크롭 및 드롭)이 인식 정확도와 로컬라이제이션 성능을 모두 향상시키는가?
RQ3주의 맵의 개수를 달리하는 것이 FGVC 정확도에 어떤 영향을 미치는가?
RQ4거친-세밀 로컬라이제이션/정교화 전략이 FGVC에서 단일 단계 분류보다 우수한가?

주요 결과

Dataset	Method	Accuracy (%)
CUB-200-2011 (testing)	WS-DAN	93.0
FGVC-Aircraft (testing)	WS-DAN	94.5
Stanford Cars (testing)	WS-DAN	92.2
Stanford Dogs (testing)	WS-DAN	92.2

WS-DAN은 네 가지 FGVC 데이터셋(CUB-200-2011, FGVC-Aircraft, Stanford Cars, Stanford Dogs)에서 최첨단 정확도를 달성했다.
주의 가이드 증강은 정확도와 객체 로컬라이제이션 IoU 모두에서 무작위 증강을 능가한다.
주의 맵 수를 늘리면 대략 32 맵까지 정확도가 향상되며, CUB-200-2011에서 약 89.4%로 포화된다.
주의 학습, 주의 자르기, 주의 드롭, 그리고 로컬라이제이션/정교화를 결합하면 가장 큰 이득을 얻는다(예: CUB-200-2011에서 89.4%).
WS-DAN 로컬라이제이션 오차는 CUB-200-2011 및 Stanford Dogs의 기준선보다 크게 낮다(예: 각각 18.3% 및 19.2%).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.