[論文レビュー] BoxSplitGen: A Generative Model for 3D Part Bounding Boxes in Varying Granularity
tldr: BoxSplitGen presents a two-model framework: (1) a box-splitting generator that learns to hierarchically split coarse 3D bounding boxes into finer ones, and (2) a box-conditioned 3D shape generator that adapts a diffusion model to produce shapes aligned with bounding boxes.
Human creativity follows a perceptual process, moving from abstract ideas to finer details during creation. While 3D generative models have advanced dramatically, models specifically designed to assist human imagination in 3D creation -- particularly for detailing abstractions from coarse to fine -- have not been explored. We propose a framework that enables intuitive and interactive 3D shape generation by iteratively splitting bounding boxes to refine the set of bounding boxes. The main technical components of our framework are two generative models: the box-splitting generative model and the box-to-shape generative model. The first model, named BoxSplitGen, generates a collection of 3D part bounding boxes with varying granularity by iteratively splitting coarse bounding boxes. It utilizes part bounding boxes created through agglomerative merging and learns the reverse of the merging process -- the splitting sequences. The model consists of two main components: the first learns the categorical distribution of the box to be split, and the second learns the distribution of the two new boxes, given the set of boxes and the indication of which box to split. The second model, the box-to-shape generative model, is trained by leveraging the 3D shape priors learned by an existing 3D diffusion model while adapting the model to incorporate bounding box conditioning. In our experiments, we demonstrate that the box-splitting generative model outperforms token prediction models and the inpainting approach with an unconditional diffusion model. Also, we show that our box-to-shape model, based on a state-of-the-art 3D diffusion model, provides superior results compared to a previous model.
研究の動機と目的
- Motivate intuitive 3D generation guided by hierarchical abstractions from coarse to fine, mirroring human creativity.
- Learn to generate sets of 3D part bounding boxes with varying granularity via iterative box splitting.
- Develop a box-to-shape generator that conditions a diffusion model on bounding boxes to produce aligned 3D shapes.
- Leverage SMART-based hierarchical bounding boxes to train split and shape generation models.
- Demonstrate improved fidelity and alignment over baselines on ShapeNet data.
提案手法
- BoxSplitGen decomposes p( B s+1 | B s ) into p(b v | B s ) for pivot box selection and p( C(b v ) | b v , B s ) for the two new boxes, using a Transformer pivot classifier and a conditional diffusion model.
- Training data comes from SMART hierarchical shape abstractions, producing leaf bounding boxes and recursive merges to form the binary split tree.
- A pivot classifier (Transformer) models which box to split, and a Child-Boxes Diffusion model generates the two new boxes given the split pivot.
- Box-to-Shape: Finetune 3DShape2VecSet with a learnable input-encoding layer to condition on bounding boxes via ControlNet-style conditioning, enabling box-aligned 3D shape generation.
- Alternative baselines include a Conditional Token Prediction model and an Unconditional Diffusion Inpainting approach for the splitting task, as well as Spice-E and a Gated 3DS2V variant for the shape generation task.
実験結果
リサーチクエスチョン
- RQ1Can a generative model learn to iteratively split coarse 3D bounding boxes into finer parts while maintaining plausible hierarchical structure?
- RQ2Does a pivot-based classifier plus diffusion-based splitting outperform token-prediction and inpainting baselines for generating bounding-box abstractions?
- RQ3Can a bounding-box-conditioned diffusion model produce high-fidelity and well-aligned 3D shapes when grounded to varying granularity bounding boxes?
- RQ4Does finetuning a state-of-the-art 3D diffusion model with a learnable bounding-box encoder and ControlNet-style conditioning outperform previous box-conditioned shape methods?
- RQ5What are the quantitative gains in diversity, fidelity, and alignment when using BoxSplitGen and Box2Shape on ShapeNet data?
主な発見
- The pivot classifier plus conditional diffusion model (BoxSplitGen) yields superior diversity and quality of bounding-box abstractions over random pivot and baseline token-prediction or unconditional inpainting methods.
- The conditional diffusion approach for splitting outperforms baselines across COV, MMD, and 1-NNA metrics at coarser and finer split levels on ShapeNet shapes.
- Box2Shape, built on 3DShape2VecSet with a learnable bounding-box encoder and ControlNet conditioning, achieves better box alignment and shape fidelity than Spice-E and a gated-3DS2V variant.
- Box-conditioned shapes generated by Box2Shape show improved alignment with input bounding boxes and higher overall quality and diversity than competing methods.
- Qualitative results indicate that conditioning on bounding boxes yields more plausible, detailed shapes compared to methods that rely on multi-view encodings or weaker conditioning.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。