QUICK REVIEW

[Paper Review] Post-hoc Concept Bottleneck Models

Mert Yüksekgönül, Maggie Haitian Wang|arXiv (Cornell University)|May 31, 2022

Data Stream Mining Techniques36 citations

TL;DR

This paper introduces Post-hoc Concept Bottleneck Models (PCBMs) that convert any pretrained model into an interpretable concept bottleneck, using concept subspaces learned from annotated data, multimodal descriptions, or residual modeling to match original accuracy. It also demonstrates global model editing via concept-level feedback with user studies.

ABSTRACT

Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require dense concept annotations in the training data to learn the bottleneck. Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining the interpretability benefits. When concept annotations are not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts via multimodal models. A key benefit of PCBM is that it enables users to quickly debug and update the model to reduce spurious correlations and improve generalization to new distributions. PCBM allows for global model edits, which can be more efficient than previous works on local interventions that fix a specific prediction. Through a model-editing user study, we show that editing PCBMs via concept-level feedback can provide significant performance gains without using data from the target domain or model retraining.

Motivation & Objective

Address the limitations of traditional Concept Bottleneck Models (CBMs), namely the need for dense concept annotations and potential loss in accuracy.
Propose a data-efficient method to convert pretrained models into PCBMs without retraining from scratch.
Enable leveraging cross-dataset concepts or natural language descriptions to construct the concept bottleneck.
Introduce a residual modeling variant (PCBM-h) to recover the original model’s performance when the concept bank is insufficient.
Demonstrate global model editing via concept-level feedback and assess usability through user studies.

Proposed method

Define a concept subspace C using Concept Activation Vectors (CAVs) learned from a concept library across datasets or via multimodal descriptions.
Project backbone embeddings onto the concept subspace to obtain a concept-projected representation f_C(x).
Train an interpretable predictor g, such as a sparse linear model, from f_C(x) to predict labels, with elastic-net regularization.
When concepts are insufficient, introduce a residual predictor r on the original embeddings to recover the original accuracy (PCBM-h).
Optionally use multimodal models (e.g., CLIP) and text encoders to derive concept vectors from natural language descriptions or ConceptNet relations to build C.
Provide a framework for global model edits by adjusting concept weights (and optionally applying a pruning/normalization procedure) without needing target-domain data.

Experimental results

Research questions

RQ1Can any pretrained model be converted into a PCBM without sacrificing accuracy?
RQ2How can concepts be learned post-hoc from other datasets or natural language descriptions to form a usable concept bottleneck?
RQ3Does residual modeling (PCBM-h) recover the original model performance when the concept bank is incomplete?
RQ4Can users perform effective global model edits via concept-level feedback without retraining or target-domain data?
RQ5What is the impact of concept-based edits on model robustness under distribution shifts?

Key findings

PCBMs achieve comparable performance to the original model on several datasets, with a noted exception (CIFAR100) where the concept bank is insufficient.
PCBM-h can recover the original model’s accuracy when the concept bank is not expressive enough, by adding a residual predictor.
Using CLIP-based concepts or multimodal descriptions can closely approach original model accuracy in some tasks, reducing dependence on labeled concept data.
Global edits via simple concept pruning strategies can recover a substantial portion of the gains from fine-tuning on target distributions (about half in some cases).
A human-guided pruning workflow improves performance more than random pruning and yields substantial gains without access to target-domain data.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.