[論文レビュー] UCSG-Net -- Unsupervised Discovering of Constructive Solid Geometry Tree
UCSG-Net は教師なしで 3D 形状の構成固体幾何学 (CSG) パースツリーを予測することを学習し、CSG 層と占有値ベースの操作を用いてプリミティブパラメータから形状を再構成します。このアプローチは、ground-truth のパースツリーがなくても、解釈可能な CSG 木と競争力の再構成品質を提供します。
Signed distance field (SDF) is a prominent implicit representation of 3D meshes. Methods that are based on such representation achieved state-of-the-art 3D shape reconstruction quality. However, these methods struggle to reconstruct non-convex shapes. One remedy is to incorporate a constructive solid geometry framework (CSG) that represents a shape as a decomposition into primitives. It allows to embody a 3D shape of high complexity and non-convexity with a simple tree representation of Boolean operations. Nevertheless, existing approaches are supervised and require the entire CSG parse tree that is given upfront during the training process. On the contrary, we propose a model that extracts a CSG parse tree without any supervision - UCSG-Net. Our model predicts parameters of primitives and binarizes their SDF representation through differentiable indicator function. It is achieved jointly with discovering the structure of a Boolean operators tree. The model selects dynamically which operator combination over primitives leads to the reconstruction of high fidelity. We evaluate our method on 2D and 3D autoencoding tasks. We show that the predicted parse tree representation is interpretable and can be used in CAD software.
研究の動機と目的
- Motivate interpretable 3D shape reconstruction using a CSG framework.
- Develop an end-to-end neural model that predicts primitive parameters and an unsupervised CSG parse tree.
- Introduce differentiable CSG layers operating on occupancy values to enable stable training.
- Demonstrate unsupervised CSG parsing on 2D CAD-like data and 3D ShapeNet-like data.
提案手法
- Encode input into a latent vector using a 2D/3D CNN encoder.
- Predict multiple primitive parameters (shape type, size, translation, rotation) in SDF form.
- Convert signed distance values to occupancy values using a learnable clipping parameter alpha.
- Compose shapes with a stack of CSG layers performing union, intersection, and difference using learnable operand selection (K_left, K_right) and Gumbel-Softmax reparameterization.
- Propagate layer-wise information with a GRU-based refinement of the latent code to stabilize multi-layer synthesis.
- Train in two stages: (i) end-to-end optimization of reconstruction and parameter penalties, (ii) fine-tuning towards interpretable, one-hot CSG selections by reducing layer temperatures tau.
実験結果
リサーチクエスチョン
- RQ1Can a neural network discover a usable CSG parse tree for reconstructing objects without supervision?
- RQ2How well can occupancy-valued CSG operations approximate standard Boolean operations across 2D and 3D data?
- RQ3Does unsupervised CSG parsing yield interpretable representations executable in CAD pipelines?
- RQ4What is the trade-off between interpretability and reconstruction fidelity in an unsupervised CSG framework?
主な発見
| Table 1. 2D CAD Reconstruction — Chamfer Distance (CD) (lower is better) cross-method comparison across modes | k | i=0 | i=∞ | |||
|---|---|---|---|---|---|---|
| Method | Mode | CSG-NetStack | Our | - | ||
| CSG-NetStack | Supervised | 1 | 3.98 | - | - | - |
| CSG-NetStack | Supervised | 10 | 1.38 | - | - | - |
| CSG-NetStack | RL | 1 | 1.27 | - | - | - |
| CSG-NetStack | RL | 10 | 1.02 | - | - | - |
| Our | Unsupervised | 1 | 0.32 | - | - | - |
- In 2D CAD data, UCSG-Net achieves an unsupervised reconstruction performance (CD metric) superior to supervised CSG-Net variants, e.g., 0.32 vs 1.02–3.98 in comparable setups.
- The method discovers meaningful CSG parse trees and primitive selections, providing interpretable reconstructions that can be rendered in CAD software.
- In 3D ShapeNet-like data, UCSG-Net attains a Chamfer Distance of 2.085 on the high-interpretability setup, while several baselines (VP, SQ, BAE, BSP-Net) range from 0.446 to 2.259, indicating a trade-off where UCSG-Net prioritizes interpretability and explicit parse trees.
- The approach demonstrates the ability to reuse primitives across layers to form complex shapes and to reveal semantic parts (e.g., wings, hull) within reconstructed objects.
- The model supports recovering a full CSG tree that can be pruned to binary form, enabling direct mesh generation without extra post-processing.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。