QUICK REVIEW

[논문 리뷰] Post-hoc Concept Bottleneck Models

Mert Yüksekgönül, Maggie Haitian Wang|arXiv (Cornell University)|2022. 05. 31.

Data Stream Mining Techniques인용 수 36

한 줄 요약

본 논문은 사후(Post-hoc) 개념 병목 모형(PCBMs)을 도입하여 임의의 사전 학습된 모델을 해석 가능한 개념 병목으로 변환하고, 주석 데이터, 멀티모달 설명 또는 잔차 모델링에서 학습된 개념 서브스페이스를 사용해 원래 정확도에 부합하도록 한다. 또한 사용자 연구를 통해 개념 수준 피드백으로 글로벌 모델 편집을 시연한다.

ABSTRACT

Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require dense concept annotations in the training data to learn the bottleneck. Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining the interpretability benefits. When concept annotations are not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts via multimodal models. A key benefit of PCBM is that it enables users to quickly debug and update the model to reduce spurious correlations and improve generalization to new distributions. PCBM allows for global model edits, which can be more efficient than previous works on local interventions that fix a specific prediction. Through a model-editing user study, we show that editing PCBMs via concept-level feedback can provide significant performance gains without using data from the target domain or model retraining.

연구 동기 및 목표

전통적인 Concept Bottleneck Models(CBMs)의 한계인 촘촘한 개념 주석의 필요성과 정확도 손실 가능성에 대응
처음부터 재학습 없이 사전 학습된 모델을 PCBMs로 변환하는 데이터 효율적 방법 제안
개념 병목을 구성하기 위해 교차 데이터세트의 개념이나 자연어 설명을 활용하는 방법
개념 은행이 충분하지 않을 때 원래 모델의 성능을 회복하기 위한 잔차 모델링 변형(PCBM-h) 도입
개념 수준 피드백을 통한 글로벌 모델 편집 시연 및 사용자 연구를 통한 사용성 평가

제안 방법

Concept Activation Vectors(CAVs)를 사용하여 데이터세트 전반의 컨셉 라이브러리에서 학습되거나 멀티모달 설명을 통해 정의된 컨셉 서브스페이스 C를 정의한다.
백본 임베딩을 컨셉 서브스페이스에 투사하여 컨셉 투영 표현 f_C(x)를 얻는다.
f_C(x)에서 레이블을 예측하기 위해 엘라스틱-넷 정규화를 갖는 희소 선형 모델과 같은 해석가능한 예측기 g를 학습한다.
컨셉이 충분치 않을 때는 원래 임베딩에 잔차 예측기 r을 도입하여 원래 정확도를 회복한다(PCBM-h).
자연어 설명이나 ConceptNet 관계를 통해 컨셉 벡터를 도출하기 위해 다중모달 모델(CLIP 등)과 텍스트 인코더를 선택적으로 활용하여 C를 구축한다.
목표 도메인 데이터 없이 컨셉 가중치를 조정하고(선택적으로 가지치기/정규화 절차 적용) 글로벌 모델 편집 프레임워크를 제공한다.

실험 결과

연구 질문

RQ1사전 학습된 모든 모델이 정확도 손실 없이 PCBM으로 변환될 수 있는가?
RQ2다른 데이터세트나 자연어 설명으로부터 사후적으로 컨셉을 학습하여 usable한 컨셉 병목을 형성할 수 있는가?
RQ3개념 은행이 불충분할 때 잔차 모델링(PCBM-h)이 원래 모델의 성능을 회복하는가?
RQ4목표 도메인 데이터 없이도 컨셉 수준 피드백으로 효과적인 글로벌 모델 편집이 가능한가?
RQ5개념 기반 편집이 분포 변화 하에서 모델의 강건성에 미치는 영향은 무엇인가?

주요 결과

PCBMs는 여러 데이터세트에서 원래 모델과 비슷한 성능을 달성하지만, 컨셉 은행이 충분하지 않은 CIFAR100에서는 예외가 관찰된다.
컨셉 은행이 표현력이 충분하지 않을 때 잔차 예측기 추가로 원래의 정확도를 회복하는 PCBM-h를 제시한다.
CLIP 기반 컨셉이나 멀티모달 설명을 활용하면 일부 태스크에서 원래 모델의 정확도에 근접하게 접근해 라벨링된 컨셉 데이터 의존도를 줄일 수 있다.
목표 분포에 대한 파인튜닝의 이익을 상당 부분 회복할 수 있는 간단한 컨셉 가지치기 전략을 통한 글로벌 편집은(일부 케이스에서) 성능의 절반 수준 정도를 회복시킨다.
인간 주도 가지치기 워크플로우가 무작위 가지치기보다 성능을 더 향상시키고 목표 도메인 데이터 없이도 상당한 이득을 준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.