Skip to main content
QUICK REVIEW

[论文解读] Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond

Mohammadreza Armandpour, Ali Sadeghian|arXiv (Cornell University)|Apr 11, 2023
Generative Adversarial Networks and Image Synthesis被引用 22
一句话总结

该论文提出 Perp-Neg,一种训练-free 的负向提示采样方法,利用垂直梯度将负向提示与主提示更好分离,从而在 2D 中改善视图条件化,在 3D DreamFusion 中缓解 Janus 问题。

ABSTRACT

Although text-to-image diffusion models have made significant strides in generating images from text, they are sometimes more inclined to generate images like the data on which the model was trained rather than the provided text. This limitation has hindered their usage in both 2D and 3D applications. To address this problem, we explored the use of negative prompts but found that the current implementation fails to produce desired results, particularly when there is an overlap between the main and negative prompts. To overcome this issue, we propose Perp-Neg, a new algorithm that leverages the geometrical properties of the score space to address the shortcomings of the current negative prompts algorithm. Perp-Neg does not require any training or fine-tuning of the model. Moreover, we experimentally demonstrate that Perp-Neg provides greater flexibility in generating images by enabling users to edit out unwanted concepts from the initially generated images in 2D cases. Furthermore, to extend the application of Perp-Neg to 3D, we conducted a thorough exploration of how Perp-Neg can be used in 2D to condition the diffusion model to generate desired views, rather than being biased toward the canonical views. Finally, we applied our 2D intuition to integrate Perp-Neg with the state-of-the-art text-to-3D (DreamFusion) method, effectively addressing its Janus (multi-head) problem. Our project page is available at https://Perp-Neg.github.io/

研究动机与目标

  • 识别正向提示与负向提示重叠时当前负向提示的局限性。
  • 开发一种训练-free 的方法,在不损伤主概念的情况下使用负向提示。
  • 通过 DreamFusion 集成,展示改进的视图条件化 2D 生成以及在 3D 中减少 Janus 问题。

提出的方法

  • 将 Perp-Neg 定义为一种采样方案,使用去噪分数的垂直分量将负向提示与主正向提示分离。
  • 将其推广到一组负向提示,通过将每个负向去噪分量投影到与主正向分量正交的空间(Equation 8)。
  • 在 2D 和 3D 生成期间,用 Perp-Neg 指导项替代或增强基于 SDS 的损失(Equation 11 及相关定义)。
  • 将 Perp-Neg 应用于 2D 视图条件化,通过设计带有视图感知权重的正向/负向提示集来生成目标的背面/侧面/正面视图。
  • 将 Perp-Neg 集成到 Stable DreamFusion,通过在 3D 重构中对期望视图条件化 2D 扩散先验来解决 Janus 问题(通过 Score Distillation Sampling 变体)。
  • 进行定量的 2D 视图对齐实验和 3D DreamFusion 实验,以验证保真度改进和 Janus 缓解。

实验结果

研究问题

  • RQ1正向提示与负向提示重叠时,是否会降低扩散模型的提示保真度,Perp-Neg 是否能够缓解?
  • RQ2一个训练-free 的、垂直梯度的采样方案是否改善 2D 的视图条件化生成并减少文本到 3D 流程中的 Janus 问题?
  • RQ3Perp-Neg 如何与 DreamFusion 集成以约束视图条件化的 3D 输出?
  • RQ4使用 Perp-Neg 相较于 Vanilla 采样及其他基线,在 2D 视图保真度和 3D 视图一致性方面有哪些经验性提升?

主要发现

  • Perp-Neg 在 2D 提示中显著提高了生成请求视图的成功率,相比 Vanilla Stable Diffusion 与 CEBM 基线(侧视图:73.1%,背视图:40.4%)。
  • CEBM 在正向与负向提示重叠时表现较差,而 Perp-Neg 能处理重叠并保留主语义内容。
  • 在 3D DreamFusion 实验中,Perp-Neg 减少了 Janus 异常,相较于非 Perp-Neg 运行,在像“a corgi”这样的提示中更易获得正确的视图保真度。
  • Perp-Neg 实现视图内插和改善的条件化,使与提示指定视点的对齐更为准确。
  • 该方法是训练-free 的,可以应用于预训练的扩散模型而无需微调。
  • 实验表明 2D 提示保真度的提升转化为更好的 3D 视图一致性,且在 Janus 问题中减少了典型视角偏差。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。