QUICK REVIEW

[논문 리뷰] Learning Robust Representations by Projecting Superficial Statistics Out

Haohan Wang, Zexue He|arXiv (Cornell University)|2019. 03. 02.

Domain Adaptation and Few-Shot Learning인용 수 93

한 줄 요약

이 논문은 질감을 포착하기 위해 Neural Gray-Level Co-occurrence Matrix (NGLCM)를 도입하고, HEX를 사용하여 질감 관련 신호를 투사 제거함으로써 대상 도메인 데이터 없이도 도메인 일반화 성능을 향상시킨다.

ABSTRACT

Despite impressive performance as evaluated on i.i.d. holdout data, deep neural networks depend heavily on superficial statistics of the training data and are liable to break under distribution shift. For example, subtle changes to the background or texture of an image can break a seemingly powerful classifier. Building on previous work on domain generalization, we hope to produce a classifier that will generalize to previously unseen domains, even when domain identifiers are not available during training. This setting is challenging because the model may extract many distribution-specific (superficial) signals together with distribution-agnostic (semantic) signals. To overcome this challenge, we incorporate the gray-level co-occurrence matrix (GLCM) to extract patterns that our prior knowledge suggests are superficial: they are sensitive to the texture but unable to capture the gestalt of an image. Then we introduce two techniques for improving our networks' out-of-sample performance. The first method is built on the reverse gradient method that pushes our model to learn representations from which the GLCM representation is not predictable. The second method is built on the independence introduced by projecting the model's representation onto the subspace orthogonal to GLCM representation's. We test our method on the battery of standard domain generalization data sets and, interestingly, achieve comparable or better performance as compared to other domain generalization methods that explicitly require samples from the target distribution for training.

연구 동기 및 목표

다른 unseen 도메인에 대해 일반화하는 분류기를 학습시키고 질감/배경과 같은 피상적 통계에 대한 의존도를 줄이려는 목표를 제시한다.
differentiable texture-only feature extractor (NGLCM)과 학습 중 질감 정보를 버리기 위한 방법(HEX)을 개발한다.
Target 도메인 샘플을 학습 중에 사용하지 않고 합성 및 표준 DG 벤치마크에서 효과를 시연한다.
HEX가 다양한 데이터셋에서 기존 DG 방법과 비교해 어떻게 나타나는지 평가한다.

제안 방법

질감 정보를 포착하되 의미 콘텐츠를 피하는 differentiable 블록으로 Neural Gray-Level Co-occurrence Matrix (NGLCM)를 도입한다.
G = s(a; φ_a) s^T(b; φ_b)로 정의하되 s는 잘려진 미분가능 임계 함수이며, 이미지 픽셀을 질감 표현으로 매핑한다.
두 가지 HEX 전략을 제안한다: (i) h(X; θ)에서 GLCM 특징을 복구하도록 예측자를 적대적으로 학습하고 이를 역전파하여 이를 속이는 방법 (ADV/ADVE), (ii) F_A를 F_G의 정 orthogonal complement로 투영하여 F_L을 얻는 방법 ( HEX ).
원시 표현 h(X; θ)와 질감 표현 g(X; φ)를 결합하는 두-branch 아키텍처를 사용하여 예측을 생성하고, 평가 시 변환된 표현 F_L을 사용한다.
MNIST 계열, 합성 방해 배경 얼굴 표정 데이터, MNIST 회전, PACS를 대상으로 HEX/ADV를 DG 베이스라인(DANN, InfoDropout 등)과 비교했다.

실험 결과

연구 질문

RQ1레이블이 없는 Target 도메인 데이터 없이도 모델이 피상적 통계에 덜 의존하도록 유도할 수 있는가?
RQ2미분가능한 질감 기반 표현(NGLCM)과 투영 기반 불변성(HEX)이 기존 DG 방법에 비해 Out-of-Domain 성능을 향상시키는가?
RQ3HEX와 NGLCM은 합성 및 실제 도메인 시프트 벤치마크(PACS 및 MNIST-rotation)를 어떻게 수행하는가?
RQ4주된 분류기와 함께 NGLCM/HEX를 공동 학습할 때의 트레이드오프 및 안정성 문제는 무엇이며, 이를 해결하는 학습 휴리스틱은 무엇인가?

주요 결과

NGLCM은 주로 질감 정보를 포착하며 의미 디지털 인식에 효과적이지 않아 질감 중심의 역할을 확인했다.
HEX는 다양한 분포 이동에 대한 강건성을 향상시키고, 대상 도메인 샘플이 필요한 DG 방법(DANN, Fusion 등)과 비교해 여러 벤치에서 경쟁적이거나 우수한 성능을 보였다.
MNIST-rotation 실험에서 HEX는 일반적으로 강한 평균 성능을 보였으며 때로는 최첨단 도메인 일반화 방법에 근접하거나 그 이상을 달성했다.
PACS 데이터셋에서 HEX는 Fusion 방법에 근접하면서도 훨씬 적은 파라미터를 사용했고 Art 및 Cartoon 도메인에서 특히 우수하게 수행했다.
합성 방해 배경 작업 전체에서 ADV와 HEX는 도메인 상관성 강도가 증가함에 따라 안정적인 개선을 보였고, HEX는 adversarial 접근 방식에 보완적 이점을 제공했다.
저자들은 NGLCM의 의미 정보 완전 제거 실패 및 잠재적 학습 불안정성 등 한계를 논의하고, 학습 휴리스틱으로 완화되었다고 언급한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.