QUICK REVIEW

[论文解读] Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation

Shin-Ying Yeh, Yu-Guan Hsieh|arXiv (Cornell University)|Sep 26, 2023

Generative Adversarial Networks and Image Synthesis被引用 10

一句话总结

本文介绍 LyCORIS，这是一个用于稳定扩散微调的开源库，提供多种基于秩的方法，并有一个综合评估框架，用以在提示、概念和超参数方面比较这些方法。

ABSTRACT

Text-to-image generative models have garnered immense attention for their ability to produce high-fidelity images from text prompts. Among these, Stable Diffusion distinguishes itself as a leading open-source model in this fast-growing field. However, the intricacies of fine-tuning these models pose multiple challenges from new methodology integration to systematic evaluation. Addressing these issues, this paper introduces LyCORIS (Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion) [https://github.com/KohakuBlueleaf/LyCORIS], an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion. Furthermore, we present a thorough framework for the systematic assessment of varied fine-tuning techniques. This framework employs a diverse suite of metrics and delves into multiple facets of fine-tuning, including hyperparameter adjustments and the evaluation with different prompt types across various concept categories. Through this comprehensive approach, our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application.

研究动机与目标

介绍 LyCORIS，这是一个提供一系列 Stable Diffusion 微调方法的开源库。
提出一个使用多样化提示和指标的综合评估框架，以评估微调性能。
探究超参数和训练选择如何影响概念保真、文本-图像对齐以及基础模型的保留。
提供关于在不同自定义任务中何时使用不同的 LyCORIS 方法的实用见解。

提出的方法

介绍 LyCORIS，这是一种实现 LoRA 基于方法和替代秩适配方法的 Stable Diffusion 微调库。
描述核心算法：LoORA（LoCon）、LoHa 和 LoKr，包括它们的数学公式（例如 h′ = W0h + b + γ∆Wh，其中 ∆W 通过 B 和 A 分解，以及 Hadamard 与 Kronecker 积的变体）。
定义一个概念描述符，包含触发词和可选的类别词，在字幕和提示中使用。
概述一个模块化评估框架，包含提示类型（train、generalization、concept-agnostic）和指标（fidelity、controllability、diversity、base-model preservation、image quality）。
在 26 种配置、3 个种子、每个配置 3 个检查点的条件下进行广泛实验，通过多种提示类型和多样化概念进行评估。
应用 SHAP 分析和 CatBoost 来解释配置选择如何影响评估指标。

实验结果

研究问题

RQ1不同的 LyCORIS 微调算法（LoRA、LoHa、LoKr）在保真度、可控性、多样性和基础模型保留方面的比较如何？
RQ2超参数（学习率、训练的层、维度、alpha、factor）和数据选择如何影响在不同概念类别上的性能？
RQ3关于何时在简单与复杂的多概念微调任务中使用各算法，可以得到哪些指导？
RQ4应如何构建提示，以在训练字幕之外正确评估学习到的概念？

主要发现

LoHa 通常在简单的多概念微调任务中表现出色，而在完整维度的 LoKr 则在复杂的单概念情境中受益。
增加模型容量（维度）或降低因子，可以在性能上产生类似于更高的学习率或更长训练时间的变化，并对指标产生细微的影响。
微调所有层（包括卷积层）通常会提升图像相似性，但在超参数未达到平衡时，可能降低文本-图像对齐和基础模型保留等其他方面。
对注意力层的受限微调会降低图像相似性，并可能阻碍目标概念的学习，凸显需要超越注意力模块进行调参。
综合评估框架揭示没有单一的优越方法；表现取决于任务和提示，强调精心设计指标并使用多提示进行稳健评估。
研究倡导在模型定制领域进行更广泛、系统的评估，超越现有的单一指标评估。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。