QUICK REVIEW

[論文レビュー] Mass Concept Erasure in Diffusion Models with Concept Hierarchy

Jiahang Tu, Ye Li|arXiv (Cornell University)|Jan 6, 2026

Domain Adaptation and Few-Shot Learning被引用数 0

ひとこと要約

論文は拡散モデルにおけるグループ単位のマス概念抹消を可能にする親子概念階層を導入し、LoRAのダウンプロジェクションを固定してアッププロジェクションだけを更新する SuPLoRA を提案することで、抹消と生成品質のバランスを改善する。

ABSTRACT

The success of diffusion models has raised concerns about the generation of unsafe or harmful content, prompting concept erasure approaches that fine-tune modules to suppress specific concepts while preserving general generative capabilities. However, as the number of erased concepts grows, these methods often become inefficient and ineffective, since each concept requires a separate set of fine-tuned parameters and may degrade the overall generation quality. In this work, we propose a supertype-subtype concept hierarchy that organizes erased concepts into a parent-child structure. Each erased concept is treated as a child node, and semantically related concepts (e.g., macaw, and bald eagle) are grouped under a shared parent node, referred to as a supertype concept (e.g., bird). Rather than erasing concepts individually, we introduce an effective and efficient group-wise suppression method, where semantically similar concepts are grouped and erased jointly by sharing a single set of learnable parameters. During the erasure phase, standard diffusion regularization is applied to preserve denoising process in unmasked regions. To mitigate the degradation of supertype generation caused by excessive erasure of semantically related subtypes, we propose a novel method called Supertype-Preserving Low-Rank Adaptation (SuPLoRA), which encodes the supertype concept information in the frozen down-projection matrix and updates only the up-projection matrix during erasure. Theoretical analysis demonstrates the effectiveness of SuPLoRA in mitigating generation performance degradation. We construct a more challenging benchmark that requires simultaneous erasure of concepts across diverse domains, including celebrities, objects, and pornographic content.

研究の動機と目的

拡散モデルにおいて一般的な生成を犠牲にせず、複数の望ましくない概念を抹消する問題を動機づける。
セマンティックに関連する抹消概念をスーパタイプ概念の下にグループ化する2レベルの概念階層を提案する。
抹消されたグループ概念を共有パラメータセットを用いて共同抹消し、拡散正則化を適用してマスクされていない領域のデノイズを維持するグループ単位の抹消機構を開発する。
ダウンプロジェクションを凍結した LoRA 設定でアッププロジェクションのみを更新することでスーパタイプ生成を保存する SuPLoRA を導入する。
既存のマス抹消法よりも効率と生成保存性を改善する理論分析と実証的証拠を提供する。

提案手法

抹消概念をスーパタイプ親概念の下に子ノードとしてグループ化するスーパタイプ–サブタイプ概念階層を構築する。
MACEスタイルの注意喚起ベースの抑制を用いて、グループ化された概念を単一のパラメータ集合を共有して共同抹消し、マスクされていない領域には拡散正則化を適用する。
SuPLoRA（Supertype-Preserving Low-Rank Adaptation）を、ダウンプロジェクション行列 B を凍結し、各スーパタイプについてアッププロジェクション行列 A だけを学習させ、更新がスーパタイプ勾配部分空間に直交するようにする。
入力埋め込みから得られるスーパタイプ部分空間の直交補空間を span するように B を初期化し、複数の SuPLoRA モジュールを知識蒸留で融合して最終の W* を得て一般生成を保存する。
SuPLoRA がマス抹消中のスーパタイプ概念の劣化を緩和するという理論的正当性を提供する。

実験結果

リサーチクエスチョン

RQ1概念階層はパラメータ数を増やさずに semantically 関連する概念のグループ抹消を効率的に実現できるか。
RQ2スーパタイプの生成を保存するには更新をスーパタイプ部分空間に直交させる制約が必要か、そして SuPLoRA が実際にこれを実現できるか。
RQ3SuPLoRA を用いたグループ単位の抹消は、概念単位および他のマス抹消法と比較して抹消効果とドメイン特異性、MS-COCO、スーパタイプ生成の保存の観点でどうなるか。
RQ4提案された階層と SuPLoRA がマス抹消時のストレージと学習効率に与える影響はどうか。

主な発見

概念階層はグループ単位の抹消を可能にし、概念単位法と比較してパラメータ増加を抑えつつ効率を改善する。
ダウンプロジェクションを凍結しアッププロジェクションのみ更新することで SuPLoRA はスーパタイプ生成を効果的に保存し、理論的裏付けがある。
実証的な結果は、対象概念の抹消と一般・スーパタイプ生成の保存の間で、オブジェクト、セレブリティ、ポルノグラム領域を跨いで有利なトレードオフを示す。
この手法は MACE などのベースラインと比較して、ストレージと訓練時間を削減しつつ、ドメイン特異性と MS-COCO 生成を維持しつつ強力な抹消を実現する。
知識蒸留により複数の SuPLoRA モジュールを最終モデルに統合し、抹消効果と一般生成能力を保持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。