QUICK REVIEW

[论文解读] CASL: Concept-Aligned Sparse Latents for Interpreting Diffusion Models

Zhenghao He, Kèyù Zhü|arXiv (Cornell University)|Jan 21, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

CASL 提出一个监督框架，将扩散模型的稀疏潜在维度与人类语义概念对齐，并通过 CASL-Steer 和新的 Editing Precision Ratio 验证它们。

ABSTRACT

Internal activations of diffusion models encode rich semantic information, but interpreting such representations remains challenging. While Sparse Autoencoders (SAEs) have shown promise in disentangling latent representations, existing SAE-based methods for diffusion model understanding rely on unsupervised approaches that fail to align sparse features with human-understandable concepts. This limits their ability to provide reliable semantic control over generated images. We introduce CASL (Concept-Aligned Sparse Latents), a supervised framework that aligns sparse latent dimensions of diffusion models with semantic concepts. CASL first trains an SAE on frozen U-Net activations to obtain disentangled latent representations, and then learns a lightweight linear mapping that associates each concept with a small set of relevant latent dimensions. To validate the semantic meaning of these aligned directions, we propose CASL-Steer, a controlled latent intervention that shifts activations along the learned concept axis. Unlike editing methods, CASL-Steer is used solely as a causal probe to reveal how concept-aligned latents influence generated content. We further introduce the Editing Precision Ratio (EPR), a metric that jointly measures concept specificity and the preservation of unrelated attributes. Experiments show that our method achieves superior editing precision and interpretability compared to existing approaches. To the best of our knowledge, this is the first work to achieve supervised alignment between latent representations and semantic concepts in diffusion models.

研究动机与目标

通过将稀疏潜在单元与人类概念联系起来，激发对扩散模型内部的解释研究。
开发 CASL，通过对 SAE 表现的监督对齐学习概念对齐的稀疏潜在。
提供 CASL-Steer 作为因果探针，验证对齐方向的语义影响。
引入 EPR，联合量化对目标概念的编辑强度与对不相关属性的保持。

提出的方法

在冻结的 U-Net 激活上训练一个稀疏自编码器（SAE），以获得解耦的稀疏潜在空间 Z。
冻结 SAE 编码器，学习一个轻量线性映射，通过 Delta h = WΔ z + bΔ 将每个概念与一小组潜在维度相关联。
通过对 Δh 相加到 h 来编辑激活，并使用 DDIM 反演评估以生成编辑图像。
使用基于 DiffusionCLIP 的损失加上一个 L1 重构项，将编辑与目标概念对齐。
CASL-Steer 构建前-k 个概念对齐的潜在方向，并将其语义效应作为探针机制进行评估。
提出 Editing Precision Ratio（EPR），用于衡量目标属性变化相对于非目标属性变化的强度。

实验结果

研究问题

RQ1在监督设定下学习的稀疏潜在方向是否能够与扩散模型中的人类定义语义概念对齐？
RQ2概念对齐的潜在方向是否会以局部且解耦的方式因果影响生成内容？
RQ3与现有方法相比，CASL 框架是否能实现更精确的语义编辑，且副作用更少？
RQ4所提出的 EPR 指标在评估概念对齐和编辑精度方面有多有效？

主要发现

CASL 实现了概念对齐的稀疏潜在方向，能够进行针对性的语义编辑且副作用最小。
CASL-Steer 提供因果探针，显示沿对齐方向的编辑在不同数据集（CelebA-HQ、AFHQ-Dog、LSUN-Church）中稳定影响目标属性。
CASL 的 Editing Precision Ratio（EPR）高于基线编辑方法，表明更高的精度和可解释性。
SVM 探针显示使用前-k 个对齐潜在单元即可实现高概念可分离性，当每个概念使用 16 个单元时准确率接近完美。
SAE 表示在保持重建质量的同时实现稀疏性，支持可解释的潜在基底而不会过度牺牲保真度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。