QUICK REVIEW

[논문 리뷰] Mass Concept Erasure in Diffusion Models with Concept Hierarchy

Jiahang Tu, Ye Li|arXiv (Cornell University)|2026. 01. 06.

Domain Adaptation and Few-Shot Learning인용 수 0

한 줄 요약

이 논문은 diffusion 모델에서 그룹별 대량 개념 소거를 가능하게 하는 부모–자식 개념 계층을 도입하고 LoRA에서 다운-프로젝션을 고정시키는 한편 업-프로젝션만 업데이트하여 SuPLoRA를 제안함으로써 소거와 생성 품질 사이의 균형을 더 잘 달성한다.

ABSTRACT

The success of diffusion models has raised concerns about the generation of unsafe or harmful content, prompting concept erasure approaches that fine-tune modules to suppress specific concepts while preserving general generative capabilities. However, as the number of erased concepts grows, these methods often become inefficient and ineffective, since each concept requires a separate set of fine-tuned parameters and may degrade the overall generation quality. In this work, we propose a supertype-subtype concept hierarchy that organizes erased concepts into a parent-child structure. Each erased concept is treated as a child node, and semantically related concepts (e.g., macaw, and bald eagle) are grouped under a shared parent node, referred to as a supertype concept (e.g., bird). Rather than erasing concepts individually, we introduce an effective and efficient group-wise suppression method, where semantically similar concepts are grouped and erased jointly by sharing a single set of learnable parameters. During the erasure phase, standard diffusion regularization is applied to preserve denoising process in unmasked regions. To mitigate the degradation of supertype generation caused by excessive erasure of semantically related subtypes, we propose a novel method called Supertype-Preserving Low-Rank Adaptation (SuPLoRA), which encodes the supertype concept information in the frozen down-projection matrix and updates only the up-projection matrix during erasure. Theoretical analysis demonstrates the effectiveness of SuPLoRA in mitigating generation performance degradation. We construct a more challenging benchmark that requires simultaneous erasure of concepts across diverse domains, including celebrities, objects, and pornographic content.

연구 동기 및 목표

확산 모델에서 일반 생성 품질을 해치지 않으면서 여러 바람직하지 않은 개념을 소거하는 문제의 필요성을 제기한다.
의미적으로 관련된 소거 개념들을 슈퍼타입 개념 아래에 그룹화하는 이층 수준의 개념 계층을 제안한다.
공유 파라미터로 묶인 그룹화된 개념을 억제하면서 확산 정규화를 통한 디노이징을 보존하는 그룹별 소거 메커니즘을 개발한다.
고정된 다운-프로젝션 LoRA 설정에서 업-프로젝션만 업데이트하여 슈퍼타입 생성을 보존하는 SuPLoRA를 도입한다.
기존의 대량 소거 방법에 비해 향상된 효율성과 생성 보존을 보이는 이론적 분석과 실험적 증거를 제시한다.

제안 방법

삭제 대상 개념이 슈퍼타입 부모 개념 아래에 그룹화된 자식 노드인 슈퍼타입–서브타입 개념 계층을 구성한다.
단일 파라미터 세트를 공유하여 그룹화된 개념을 공동으로 소거하기 위해 MACE 스타일의 주의 기반 억제를 사용하고, 마스킹되지 않은 영역에 확산 정규화를 적용한다.
SuPLoRA (Supertype-Preserving Low-Rank Adaptation)를 다운-프로젝션 행렬 B를 고정시키고 각 슈퍼타입에 대해 업-프로젝션 행렬 A만 학습시켜 구성하며, 업데이트가 슈퍼타입 기울기 부분공간에 직교하도록 보장한다.
입력 임베딩에서 도출된 슈퍼타입 부분공간의 직교 여계를 포괄하도록 B를 초기화하고, 지식 증류를 통해 여러 SuPLoRA 모듈을 융합하여 일반 생성 보존을 갖는 최종 W*를 얻는다.
대량 소거 중에 슈퍼타입 개념의 저하를 완화한다는 SuPLoRA에 대한 이론적 정당화를 제공한다.

실험 결과

연구 질문

RQ1개념 계층이 매개변수 수를 비례적으로 증가시키지 않으면서 의미적으로 관련된 개념들의 그룹 소거를 효율적으로 가능하게 할 수 있는가?
RQ2슈퍼타입의 생성을 보존하려면 업데이트를 슈퍼타입 부분공간에 직교하도록 제한해야 하는가, 그리고 SuPLoRA가 이를 실현할 수 있는가?
RQ3그룹별 소거(SuPLoRA 포함)가 개념별 소거 및 다른 대량 소거 방법에 비해 소거 효과와 도메인 특정, MS-COCO, 그리고 슈퍼타입 생성 보존 측면에서 어떤 차이가 있는가?
RQ4제안된 계층 구조와 SuPLoRA가 대량 소거 중 저장소 및 학습 효율성에 미치는 영향은 어떤가?

주요 결과

개념 계층은 그룹별 소거를 가능하게 하며, 개념별 방법에 비해 매개변수 증가를 감소시키고 효율성을 향상시킨다.
SuPLoRA는 다운-프로젝션을 고정하고 업-프로젝션만 업데이트하여 슈퍼타입 생성을 효과적으로 보존하며 이론적 근거가 있다.
경험적 결과는 객체, 셀러브리티, 포르노그래피 도메인 전반에서 대상 개념을 소거하고 일반 및 슈퍼타입 생성 보존 사이의 우호적인 트레이드오프를 보여준다.
본 방법은 MACE와 같은 기준대비 저장소 및 학습 시간 감소와 함께 도메인 특이성 정확도와 MS-COCO 생성의 유지라는 측면에서 강력한 소거 성능을 달성한다.
지식 증류는 여러 SuPLoRA 모듈을 최종 모델로 융합하여 소거 효과와 일반 생성 능력을 모두 유지한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.