[論文レビュー] Text-to-3D with Classifier Score Distillation
The paper proposes Classifier Score Distillation (CSD), showing that the classifier score component is sufficient for text-to-3D generation, and demonstrates state-of-the-art results across 3D generation, texture synthesis, and editing.
Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods. Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation
研究の動機と目的
- Reevaluate the role of classifier-free guidance (CFG) in score distillation for text-to-3D generation.
- Demonstrate that the classifier component can drive 3D synthesis without relying on the generative prior.
- Develop CSD as a practical alternative to SDS for NeRF/mesh generation and texture synthesis.
- Explore enhancements such as annealed negative prompts and text-guided editing within the CSD framework.
提案手法
- Formulate Classifier Score Distillation (CSD) that optimizes 3D scenes using only the classifier score term derived from implicit diffusion classifiers.
- Decompose SDS gradient into generative-prior and classifier-score components to show the dominance of the classifier term under CFG.
- Introduce annealed negative prompts within CSD to jointly optimize positive and negative classifier scores for improved texture quality and prompt fidelity.
- Extend CSD to text-guided 3D editing by replacing prompts to target and edit attributes while balancing alignment and fidelity.
- Discuss connections to Variational Score Distillation (VSD) and interpret negative prompts as classifier-score-based guidance.

実験結果
リサーチクエスチョン
- RQ1Can the classifier score alone (via implicit classifiers from diffusion models) drive high-quality text-to-3D generation without the generative prior?
- RQ2How do negative prompts and their annealing affect the balance between prompt fidelity and texture quality in CSD?
- RQ3Can CSD be effectively applied to texture synthesis and 3D editing beyond pure generation?
- RQ4What is the relationship between CSD and existing SDS/VSD frameworks in practice and theory?
主な発見
- Classifier score guidance alone can drive effective text-to-3D generation, achieving results competitive with or superior to SDS-based methods.
- Negative prompts, when annealed properly, improve texture quality and fidelity to the target prompt while maintaining alignment.
- CSD enables competitive text-guided texture synthesis on 3D meshes with reduced artifacts and better local/global consistency.
- CSD supports efficient text-driven 3D editing by steering rendered outputs toward target descriptions and away from undesired attributes.
- Experimental results show strong qualitative and quantitative performance, including user studies favoring CSD over baselines.

より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。