QUICK REVIEW

[论文解读] GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors

Md Ferdous Alam, Faez Ahmed|arXiv (Cornell University)|Sep 8, 2024

Manufacturing Process and Optimization被引用 5

一句话总结

GenCAD 学习在给定输入图像的条件下生成可编辑的 CAD 命令序列，方法是结合自回归 Transformer、对比多模态学习和扩散先验，从而实现基于图像的 CAD 创建与检索。

ABSTRACT

The creation of manufacturable and editable 3D shapes through Computer-Aided Design (CAD) remains a highly manual and time-consuming task, hampered by the complex topology of boundary representations of 3D solids and unintuitive design tools. While most work in the 3D shape generation literature focuses on representations like meshes, voxels, or point clouds, practical engineering applications demand the modifiability and manufacturability of CAD models and the ability for multi-modal conditional CAD model generation. This paper introduces GenCAD, a generative model that employs autoregressive transformers with a contrastive learning framework and latent diffusion models to transform image inputs into parametric CAD command sequences, resulting in editable 3D shape representations. Extensive evaluations demonstrate that GenCAD significantly outperforms existing state-of-the-art methods in terms of the unconditional and conditional generations of CAD models. Additionally, the contrastive learning framework of GenCAD facilitates the retrieval of CAD models using image queries from large CAD databases, which is a critical challenge within the CAD community. Our results provide a significant step forward in highlighting the potential of generative models to expedite the entire design-to-production pipeline and seamlessly integrate different design modalities.

研究动机与目标

Motivate automating CAD modeling to speed up the design-to-production pipeline.
Propose a scalable, image-conditioned generative model that outputs CAD command sequences (not just final B-rep).
Leverage multimodal representation learning to align CAD programs with images and enable retrieval.
Demonstrate improved accuracy and modifiability over prior unconditional CAD generation approaches.

提出的方法

Develop an autoregressive transformer encoder-decoder (CSR) to learn latent representations of CAD command sequences.
Train a contrastive CAD-Image Pre-training (CCIP) model to learn a joint latent space for CAD commands and input images using a ResNet-based image encoder.
Introduce a CAD Diffusion Prior (CDP) that generates CAD latents conditioned on image latents, with an option for a deterministic prior.
Use a pre-trained CAD decoder from CSR to generate CAD command sequences from latents produced by the CDP.
Represent CAD commands as fixed-dimensional vectors (t_i, p_i) with 8-bit quantized parameters to form a language-like CAD program.
Decode CAD latents into a sequence of CAD commands using the frozen CSR decoder after sampling from the diffusion prior.

实验结果

研究问题

RQ1Can an autoregressive transformer effectively reconstruct CAD command sequences from learned latent representations?
RQ2Does contrastive learning improve the alignment between CAD command latents and input CAD images?
RQ3Can a diffusion prior conditioned on image latents generate high-quality CAD command sequences that yield valid 3D solids?
RQ4Does the image-conditioned GenCAD framework enable reliable retrieval of CAD programs using image queries?

主要发现

GenCAD significantly outperforms state-of-the-art unconditional CAD generation methods in terms of accuracy and modifiability of the generated CAD programs.
The framework yields better accuracy for long CAD command sequences, supporting complex design tasks.
The CCIP component enables image-based retrieval of CAD models with substantial gains (more than 15x accuracy) over image-to-image retrieval baselines.
The approach demonstrates that CAD programs can be effectively generated from images and then converted to B-rep or other representations via standard geometry kernels.
Using a frozen pre-trained CAD encoder and decoder helps scale training to large datasets and improves efficiency.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。