QUICK REVIEW

[論文レビュー] Cones: Concept Neurons in Diffusion Models for Customized Generation

Zhi‐Heng Liu, Ruili Feng|arXiv (Cornell University)|Mar 9, 2023

Neural Networks and Applications被引用数 19

ひとこと要約

Conesは拡散モデルにおける小さな概念ニューロンのクラスターを特定し、主体指向生成を制御する。これらのニューロンを活性化またはシャットダウンすることにより、複数の主体を一枚の画像で高い頑健性とストレージ効率で生成・組み合わせ可能。

ABSTRACT

Human brains respond to semantic features of presented stimuli with different neurons. It is then curious whether modern deep neural networks admit a similar behavior pattern. Specifically, this paper finds a small cluster of neurons in a diffusion model corresponding to a particular subject. We call those neurons the concept neurons. They can be identified by statistics of network gradients to a stimulation connected with the given subject. The concept neurons demonstrate magnetic properties in interpreting and manipulating generation results. Shutting them can directly yield the related subject contextualized in different scenes. Concatenating multiple clusters of concept neurons can vividly generate all related concepts in a single image. A few steps of further fine-tuning can enhance the multi-concept capability, which may be the first to manage to generate up to four different subjects in a single image. For large-scale applications, the concept neurons are environmentally friendly as we only need to store a sparse cluster of int index instead of dense float32 values of the parameters, which reduces storage consumption by 90\% compared with previous subject-driven generation methods. Extensive qualitative and quantitative studies on diverse scenarios show the superiority of our method in interpreting and manipulating diffusion models.

研究の動機と目的

拡散モデルが人間の概念ニューロンに類似した主体特異的概念ニューロンを含むかどうかを動機づける。
特定の主体を制御する概念ニューロンを特定するための勾配ベースの手法を提案する。
異なる文脈でも概念ニューロンをシャットダウンすることで対象の主体を得られることを示す。
概念ニューロンを連結することで複数主体の生成と1枚の画像中の最大4主体を実現できることを示す。
従来の主体駆動生成法と比較して顕著なストレージ節約を実証する。

提案手法

対象主体を制御するスケーリングを持つK-Vアテンション層の小さなニューロン集合を同定することを目的として定義する。
概念ニューロンである条件の勾配ベースの基準を、concept-implanting loss L_conとその勾配を用いて導出する。
theta * (dL_con/dtheta) の符号と大きさを分析することで概念ニューロンを識別する自己適応型サンプリング手順を提案する。
概念ニューロンであるニューロンを示す二値マスク M を計算し、それを用いて非本質的パラメータを無効化する。
二値、float16、四値（quaternary）、および float32 の設定が類似の制御性能を示すことを示し、概念ニューロンの頑健性を示す。
複数の主体から概念ニューロンを連結して、単一の画像で結合概念を生成する加法性を示す。

実験結果

リサーチクエスチョン

RQ1拡散モデルが人間の概念ニューロンに類似した主体特異的概念ニューロンを含むかどうかを動機づける。
RQ2特定の主体を制御する概念ニューロンを特定するための勾配ベースの基準を信頼できるかどうかを問う。
RQ3概念ニューロンをシャットダウンすることで生成を制御できるか、そして既存の情報を保持できるか。
RQ4複数の主体の概念ニューロンを連結して、単一の画像で多主体生成を実現できるか。
RQ5概念ニューロンを用いた大規模なカスタマイズ生成のストレージと頑健性の利点は何か。

主な発見

方法	テキスト配置	画像配置
Single Subject (V1*)	0.361	0.725
Textual Inversion	0.312	0.744
DreamBooth	0.344	0.731
Custom Diffusion	0.352	0.722
Cones (Ours)	0.361	0.725
Two Subjects (V1, V2)	0.337	0.698
Textual Inversion	0.264	0.630
DreamBooth	0.283	0.673
Custom Diffusion	0.314	0.685
Cones (Ours)	0.337	0.698
Three Subjects (V1, V2, V3*)	0.301	0.685
Textual Inversion	0.223	0.584
DreamBooth	0.263	0.631
Custom Diffusion	0.289	0.669
Cones (Ours)	0.301	0.685
Four Subjects (V1, V2, V3, V4)	0.285	0.653
Textual Inversion	0.219	0.553
DreamBooth	0.238	0.597
Custom Diffusion	0.269	0.632
Cones (Ours)	0.285	0.653

概念ニューロンは、特定の主体の生成を支配するK-Vアテンション層内の小さく疎なクラスターとして存在する。
識別された概念ニューロンをシャットダウンすると、注意マップに対象主体のアウトラインが描かれ、様々な文脈で主体生成が可能になる。
Binary（シャット）概念ニューロンは、より高精度のバリアント（float32/float16）や四値表現と同等の性能を発揮し、頑健性を示す。
複数の主体から概念ニューロンを連結することで多主体生成が可能となり、協調的ファインチューニングは4主体の結果の品質を向上させる。
ストレージコストが大幅に削減される。概念ニューロンは従来法の約10％のメモリしか必要とせず、スパース性により整数インデックスのストレージが可能。
本手法は高いテキスト整合性と競合的な画像整合性を達成し、特に主体数が増えるにつれて多主体シナリオで競合手法を上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。