QUICK REVIEW

[논문 리뷰] CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets

Zachary Novack, Julian McAuley|arXiv (Cornell University)|2023. 02. 06.

Domain Adaptation and Few-Shot Learning인용 수 14

한 줄 요약

CHiLS는 계층적 라벨 구조로부터 파생된 하위 클래스 집합을 예측하고 이를 원래의 상위 클래스로 다시 매핑한 뒤, 하위 클래스 증거와 상위 클래스 증거를 결합하기 위해 재가중 단계를 사용하여 제로샷 CLIP 정확도를 향상시킵니다.

ABSTRACT

Open vocabulary models (e.g. CLIP) have shown strong performance on zero-shot classification through their ability generate embeddings for each class based on their (natural language) names. Prior work has focused on improving the accuracy of these models through prompt engineering or by incorporating a small amount of labeled downstream data (via finetuning). However, there has been little focus on improving the richness of the class names themselves, which can pose issues when class labels are coarsely-defined and are uninformative. We propose Classification with Hierarchical Label Sets (or CHiLS), an alternative strategy for zero-shot classification specifically designed for datasets with implicit semantic hierarchies. CHiLS proceeds in three steps: (i) for each class, produce a set of subclasses, using either existing label hierarchies or by querying GPT-3; (ii) perform the standard zero-shot CLIP procedure as though these subclasses were the labels of interest; (iii) map the predicted subclass back to its parent to produce the final prediction. Across numerous datasets with underlying hierarchical structure, CHiLS leads to improved accuracy in situations both with and without ground-truth hierarchical information. CHiLS is simple to implement within existing zero-shot pipelines and requires no additional training cost. Code is available at: https://github.com/acmi-lab/CHILS.

연구 동기 및 목표

풍부한 클래스 이름 계층이 열린 어휘 제로샷 분류를 향상시킬 수 있는 이유를 제시한다.
각 클래스를 제로샷 추론을 위한 하위 클래스 집합으로 변환하는 계층 기반 방법(CHiLS)을 제안한다.
실제 계층이 존재하거나 존재하지 않는 데이터셋에서도 CHiLS가 일관된 이득을 제공한다는 점을 보인다.
추가 학습 없이 GPT-3가 하위 클래스 계층을 생성할 수 있음을, 계층이 사용할 수 없을 때도 효과적인 하위 계층을 생성함을 시연한다.

제안 방법

각 상위 클래스 c_i에 대해 기존 계층 또는 GPT-3 프롬프트를 통해 하위 클래스 집합 S_c_i = {s_c_i,1, ..., s_c_i,m_i}를 생성한다.
전체 하위 클래스 라벨의 합집합(C_sub = ⋃_i S_c_i)에 대해 표준 CLIP 제로샷 예측을 실행하여 하위 클래스 확률을 얻는다.
원래의 클래스 집합 C로 상위 클래스 확률을 계산하고 이를 해당 상위 클래스 점수와 곱하여 하위 클래스 확률과 결합한다.
최상위 하위 클래스를 역 매핑 G^{-1}을 사용해 상위 클래스로 다시 매핑하여 최종 예측을 생성한다.
계층이 완벽하지 않을 때를 대비해 상위 클래스 신뢰도를 활용해 하위 클래스 확률을 조절하는 재가중 단계를 선택적으로 적용해 강건성을 향상시킨다.
실제 계층(가능할 때)과 GPT-3 생성 계층을 실험하고 재가중 단계 여부에 따른 성능을 평가한다.

Figure 1: (Left) Standard CLIP Pipeline for Zero-Shot Classification . For inference, a standard CLIP takes in input a set of classes and an image where we want to make a prediction and makes a prediction from that set of classes. (Right) Our proposed method CHiLS for leveraging hierarchical class i

실험 결과

연구 질문

RQ1계층적 라벨 구조를 활용하면 거친 또는 정의가 불분명한 클래스를 가진 데이터셋에서 제로샷 CLIP 성능을 향상시킬 수 있는가?
RQ2실제 계층이 사용 가능할 때와 생성해야 할 때(CHiLS)에서의 성능 차이와 재가중 단계의 중요도는 어떠한가?
RQ3다양한 하위 클래스 집합 크기와 계층 정보의 노이즈가 CHiLS 성능에 미치는 영향은 무엇인가?
RQ4CHiLS의 이점이 다양한 백본과 의미적 세분화 정도가 다른 데이터셋에 걸쳐 확장되는가?

주요 결과

CHiLS는 기존 계층 또는 GPT-3 생성 계층을 사용할 때도 16개 데이터셋에서 기본 상위 클래스 접근법보다 제로샷 정확도를 일관되게 향상시킨다.
실제 계층에 접근 가능하면 CHiLS는 여러 데이터셋에서 기본값 대비 약 15–30 퍼센트 포인트의 이득을 얻을 수 있다.
GPT-3가 생성한 하위 클래스 매핑은 실제 계층이 없을 때도 기준선 대비 견고한 이득을 제공한다.
계층 정보가 알려지지 않았거나 노이즈가 있을 때 재가중 단계는 성능에 결정적이지만, 완벽한 실제 계층이 제공되면 필요성이 감소한다.
CHiLS는 여러 CLIP 백본에서 안정성을 보여주며 하위 클래스 라벨 세트 크기의 중간 변경에 비교적 둔감하다.

Figure 2: Selected examples of behavior differences between the standard and CHiLS performance across two different datasets . (Upper left): CHiLS is correct, standard prediction is not. (Lower left): Both correct. (Upper right): Both wrong. (Lower Right): standard prediction is correct, CHiLS is no

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.