[Paper Review] Concept Bottleneck Models
The paper introduces concept bottleneck models that first predict human-specified concepts from input data and then predict the final target from these concepts, enabling test-time interventions and improved interpretability without sacrificing task accuracy.
We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis? State-of-the-art models today do not typically support the manipulation of concepts like "the existence of bone spurs", as they are trained end-to-end to go directly from raw input (e.g., pixels) to output (e.g., arthritis severity). We revisit the classic idea of first predicting concepts that are provided at training time, and then using these concepts to predict the label. By construction, we can intervene on these concept bottleneck models by editing their predicted concept values and propagating these changes to the final prediction. On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models, while enabling interpretation in terms of high-level clinical concepts ("bone spurs") or bird attributes ("wing color"). These models also allow for richer human-model interaction: accuracy improves significantly if we can correct model mistakes on concepts at test time.
Motivation & Objective
- Motivate interactive AI systems by aligning model reasoning with high-level concepts used by practitioners (e.g., bone spurs, joint space narrowing).
- Propose a practical method to turn any end-to-end neural network into a concept bottleneck model using a bottleneck layer and an auxiliary loss.
- Demonstrate that concept bottleneck models achieve competitive task performance while enabling concept-level interventions.
- Evaluate how well concept bottlenecks align with true concepts and explore robustness to covariate shifts.
Proposed method
- Insert a bottleneck layer of size equal to the number of concepts k into an existing network by resizing a layer.
- Train with an auxiliary loss L_C to align bottleneck neurons with provided concepts c (component-wise).
- Compare independent, sequential, and joint bottleneck training schemes to a standard end-to-end model.
- Evaluate on two tasks: knee x-ray grading (OAI) and bird species identification (CUB) using concept annotations c and target y.
- Assess concept accuracy, task accuracy, data efficiency, and the potential for test-time interventions by manipulating predicted concepts hat{c}.
Experimental results
Research questions
- RQ1Can concept bottleneck models achieve competitive task performance while predicting interpretable concepts?
- RQ2Do test-time interventions on predicted concepts improve final task accuracy, and under what conditions?
- RQ3How do independent, sequential, and joint bottleneck training regimes trade off task accuracy, concept accuracy, and intervenability?
- RQ4Are concept bottleneck models more robust to covariate shifts compared with standard end-to-end models?
Key findings
- Concept bottleneck models attain task accuracies competitive with standard end-to-end models on OAI and CUB while achieving high concept accuracies.
- Joint bottlenecks can match or exceed standard models in task performance and yield good concept accuracy (average concept RMSE ~0.53 on OAI; ~0.03–0.034 concept RMSE/CUB).
- Post-hoc concept analysis via linear probes underperform compared with training models to align with concepts, highlighting the value of explicit concept supervision.
- Test-time interventions by replacing predicted concepts with true values can substantially improve task accuracy, especially for OAI; the effectiveness depends on the training regime and model nonlinearity.
- Interventions in CUB show substantial gains when targeting concept groups, with independent bottlenecks generally more amenable to intervention than sequential/joint variants.
- Bottleneck models demonstrate robustness to background shifts (TravelingBirds), achieving lower task error than standard models under covariate shifts.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.