QUICK REVIEW

[论文解读] Cones: Concept Neurons in Diffusion Models for Customized Generation

Zhi‐Heng Liu, Ruili Feng|arXiv (Cornell University)|Mar 9, 2023

Neural Networks and Applications被引用 19

一句话总结

Cones 在扩散模型中识别出控制主体驱动生成的小型概念神经元簇；通过激活或关闭这些神经元，可以在一张图像中生成并组合多个主体，具有高鲁棒性和存储效率。

ABSTRACT

Human brains respond to semantic features of presented stimuli with different neurons. It is then curious whether modern deep neural networks admit a similar behavior pattern. Specifically, this paper finds a small cluster of neurons in a diffusion model corresponding to a particular subject. We call those neurons the concept neurons. They can be identified by statistics of network gradients to a stimulation connected with the given subject. The concept neurons demonstrate magnetic properties in interpreting and manipulating generation results. Shutting them can directly yield the related subject contextualized in different scenes. Concatenating multiple clusters of concept neurons can vividly generate all related concepts in a single image. A few steps of further fine-tuning can enhance the multi-concept capability, which may be the first to manage to generate up to four different subjects in a single image. For large-scale applications, the concept neurons are environmentally friendly as we only need to store a sparse cluster of int index instead of dense float32 values of the parameters, which reduces storage consumption by 90\% compared with previous subject-driven generation methods. Extensive qualitative and quantitative studies on diverse scenarios show the superiority of our method in interpreting and manipulating diffusion models.

研究动机与目标

探究扩散模型是否包含类似于人类概念神经元的主体特定概念神经元。
提出一种基于梯度的方法来定位控制给定主体的概念神经元。
证明关闭概念神经元在多样场景下能产生目标主体。
展示连接概念神经元可实现多主体生成，且在一张图像中支持多达四个主体。
相比先前的主体驱动生成方法，证明显著的存储节省。

提出的方法

将目标定义为在 K-V 注意力层中识别一小组神经元，其放大/缩放控制目标主体。
利用概念植入损失 L_con 及其梯度推导神经元成为概念神经元的基于梯度的判据。
提出一种自适应采样程序，通过分析 theta * (dL_con/dtheta) 的符号和幅值来识别概念神经元。
计算一个二值概念神经元掩码 M，指示哪些神经元是概念神经元，并用它来停用非关键参数。
显示二值、float16、四进制和 float32 设置在控制性能上相近，表明概念神经元的鲁棒性。
通过拼接来自多个主体的概念神经元来在单张图像中生成组合概念，展示可加性。

实验结果

研究问题

RQ1扩散模型是否编码与生物概念神经元类似的主体特定概念神经元？
RQ2基于梯度的判据是否能可靠地识别支配给定主体的一小组概念神经元？
RQ3通过关闭概念神经元来控制生成是否可行，且是否保留先前信息？
RQ4通过拼接来自多个主体的概念神经元能否在单张图像中生成多主体？
RQ5在大规模定制生成中使用概念神经元的存储与鲁棒性带来哪些好处？

主要发现

Method	Text-alignment	Image-alignment
Single Subject (V1*)	0.361	0.725
Textual Inversion	0.312	0.744
DreamBooth	0.344	0.731
Custom Diffusion	0.352	0.722
Cones (Ours)	0.361	0.725
Two Subjects (V1, V2)	0.337	0.698
Textual Inversion	0.264	0.630
DreamBooth	0.283	0.673
Custom Diffusion	0.314	0.685
Cones (Ours)	0.337	0.698
Three Subjects (V1, V2, V3*)	0.301	0.685
Textual Inversion	0.223	0.584
DreamBooth	0.263	0.631
Custom Diffusion	0.289	0.669
Cones (Ours)	0.301	0.685
Four Subjects (V1, V2, V3, V4)	0.285	0.653
Textual Inversion	0.219	0.553
DreamBooth	0.238	0.597
Custom Diffusion	0.269	0.632
Cones (Ours)	0.285	0.653

概念神经元在 K-V 注意力层中以小而稀疏的簇存在，支配给定主体的生成。
关闭所识别的概念神经元会在注意力映射中描绘目标主体的轮廓，并在不同情境下实现主体生成。
二值（关闭）概念神经元的性能与更高精度变体（float32/float16）乃至四进制表示相当，表明鲁棒性。
来自多个主体的概念神经元的拼接实现多主体生成，协同微调提升四主体结果的质量。
存储成本显著降低；概念神经元大约需要前方法记忆的 10%，稀疏性使得可整型索引存储成为可能。
该方法在文本对齐方面表现良好，在图像对齐方面具有竞争力，尤其是在主体数量增加时，在多主体情景下超越竞争方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。