[论文解读] Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Mix-of-Show 引入 ED-LoRA 以实现单客户端概念微调和中心节点梯度融合以实现去中心化的多概念定制扩散模型,以及区域可控采样用于多概念生成。
Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi-concept customization, which involves single-client concept tuning and center-node concept fusion. In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion. Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client tuning and gradient fusion for the center node to preserve the in-domain essence of single concepts and support theoretically limitless concept fusion. Additionally, we introduce regionally controllable sampling, which extends spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address attribute binding and missing object problems in multi-concept sampling. Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes.
研究动机与目标
- 促使去中心化的多概念定制,以在不共享数据的情况下结合多个用户特定概念。
- 识别概念冲突和身份丢失作为现有 LoRA 融合中的关键挑战。
- 提出 ED-LoRA,以实现更丰富的域内嵌与梯度融合,在融合过程中保持概念身份。
- 引入区域可控采样以解决多概念生成中的属性绑定和缺失对象问题。
提出的方法
- 提出 embedding-decomposed LoRA (ED-LoRA) 以实现单客户端概念微调,通过层级和多词嵌入保留域内本质。
- 在中心节点使用梯度融合,通过融合梯度使单概念推理行为对齐,从而融合多个概念的 LoRA。
- 采用区域可控采样和区域感知的交叉注意力来支持多概念生成并实现正确的属性绑定。
- 分析嵌入与 LoRA 权重以解耦概念身份并在融合过程中降低冲突。
- 与 LoRA、Custom Diffusion、P+ 在单概念和多概念设置下进行对比。
实验结果
研究问题
- RQ1如何在不产生概念冲突和身份丢失的情况下实现去中心化的多概念定制?
- RQ2以嵌入为重点的微调加上中心融合的梯度是否能理论上支持无限概念融合?
- RQ3区域可控采样是否能提高多概念生成中的属性绑定和对象在场感?
主要发现
- ED-LoRA 更多保留了嵌入中的域内本质,降低概念冲突。
- 与权重融合相比,梯度融合在多概念融合中显著降低身份丢失。
- 区域可控采样提高了多概念生成中的属性绑定与对象在场的准确性。
- Mix-of-Show 在中心节点融合时比基线方法更好地保持了各自概念身份。
- 实验表明在多概念场景中 Mix-of-Show 的图像对齐性优于其他方法,同时文本对齐性具有竞争力。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。