QUICK REVIEW

[论文解读] Text-to-3D with Classifier Score Distillation

Xin Yu, Yuan-Chen Guo|arXiv (Cornell University)|Oct 30, 2023

Generative Adversarial Networks and Image Synthesis被引用 8

一句话总结

本文提出 Classifier Score Distillation (CSD)，证明分类器分数分量对于文本到3D生成已经足以，并在3D生成、纹理合成与编辑方面展示了最先进的结果。

ABSTRACT

Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods. Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation

研究动机与目标

重新评估 score distillation 中在文本到3D生成中的 classifier-free 指导（CFG）的角色。
证明分类器组件在不依赖生成先验的情况下即可驱动3D合成。
将 CSD 发展为 NeRF/网格生成与纹理合成的 SDS 可行替代方案。
在 CSD 框架内探索如退火式负提示和文本引导编辑等增强方法。

提出的方法

提出 Classifier Score Distillation (CSD)，仅使用由隐式扩散分类器得到的分类器分数项来优化3D场景。
将 SDS 梯度分解为生成先验和分类器分数组件，以展示在 CFG 下分类器项的主导地位。
在 CSD 中引入退火式负提示，以联合优化正负分类器分数，从而提高纹理质量和提示保真度。
通过用目标替换提示并进行属性编辑，将文本引导3D编辑扩展到 CSD，同时在对齐性与保真度之间取得平衡。
讨论与变分分数蒸馏（VSD）的联系，并将负提示解释为基于分类器分数的引导。

Figure 2: Qualitative comparisons to baselines for text-to-3D generation. Our method can generate 3D scenes that align well with input text prompts with realistic and detailed appearances.

实验结果

研究问题

RQ1分类器分数本身（通过扩散模型的隐式分类器）是否能够在不依赖生成先验的情况下驱动高质量的文本到3D 生成？
RQ2负提示及其退火如何影响 CSD 中提示保真度与纹理质量之间的平衡？
RQ3CSD 是否可有效应用于纹理合成和3D编辑，而不仅仅是纯生成？
RQ4在实践与理论上，CSD 与现有的 SDS/VSD 框架之间的关系是什么？

主要发现

分类器分数引导本身就能够驱动有效的文本到3D 生成，与基于 SDS 的方法相比具有竞争力甚至更优的结果。
当负提示得到恰当退火时，能够在保持对齐的同时提升纹理质量及对目标提示的保真度。
CSD 能在3D网格上实现具竞争力的文本引导纹理合成，减少伪影并提升局部/全局一致性。
CSD 支持高效的文本驱动3D编辑，通过引导渲染输出朝向目标描述并远离不想要的属性。
实验结果显示在定性和定量上均表现强劲，包括用户研究偏好 CSD 而非基线方法。

Figure 3: Qualitative comparisons to baselines for text-guided texture synthesis on 3D meshes. Our method generates more detailed and photo-realistic textures.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。