[论文解读] CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
论文提出 CAD-MLLM,是一个基于多模态、使用 MLLM 条件化的多模态 CAD 生成框架,并在各种数据缺陷下展示了相对于 DeepCAD 的鲁棒性优势。
This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model. It contains approximately 450K instances and their CAD construction sequences. To thoroughly evaluate the quality of our generated CAD models, we go beyond current evaluation metrics that focus on reconstruction quality by introducing additional metrics that assess topology quality and surface enclosure extent. Extensive experimental results demonstrate that CAD-MLLM significantly outperforms existing conditional generative methods and remains highly robust to noises and missing points. The project page and more visualizations can be found at: https://cad-mllm.github.io/
研究动机与目标
- 在多模态条件下推动鲁棒的 CAD 生成(文本、图像和点云)。
- 使用 MLLM 将多模态条件统一用于 CAD 模型生成。
- 相对于基线评估在有噪声和部分缺失点云条件下的鲁棒性。
提出的方法
- 使用多模态受条件的输入(文本、多视图图像和点云)通过 MLLM 指导 CAD 生成。
- 在干净和扰动数据上定量比较相对于 DeepCAD 的鲁棒性。
- 报告包括 Chamfer、F-score、Normal C、SegE、DangEL、SIR 和 FluxEE 等指标。
- 展示在有噪声和部分点云测试上的评估,以显示鲁棒性提升。

实验结果
研究问题
- RQ1基于多模态、MLLM 的方法能否在嘈杂与部分数据条件下提升 CAD 生成的鲁棒性?
- RQ2在多项鲁棒性指标与数据缺陷下,CAD-MLLM 相对于 DeepCAD 的表现如何?
- RQ3融入文本、图像、点云等多样模态是否提升 CAD 模型重构质量?
主要发现
| 表/情境 | Chamfer | F-score | Normal C | SegE ↓ | DangEL ↓ | SIR | FluxEE |
|---|---|---|---|---|---|---|---|
| DeepCAD [1] (Clean) – Point | 4.63 | 71.47 | 64.47 | 9.47 | 1.32 | 6.35 | 0.375 |
| Ours(Point) (Clean) | 2.88 | 83.10 | 72.66 | 2.22 | 0.64 | 2.02 | 0.066 |
| DeepCAD [1] (Nosiy σ2=0.01) – Point | 6.71 | 55.97 | 53.34 | 9.27 | 1.38 | 8.97 | 0.227 |
| Ours(Point) (Nosiy σ2=0.01) | 3.12 | 82.05 | 71.11 | 2.21 | 0.70 | 1.85 | 0.025 |
| DeepCAD [1] (Nosiy σ2=0.02) – Point | 8.15 | 46.67 | 49.64 | 16.99 | 1.94 | 7.63 | 0.511 |
| Ours(Point) (Nosiy σ2=0.02) | 4.14 | 74.39 | 65.66 | 2.31 | 0.51 | 1.82 | 0.049 |
| DeepCAD [1] (Nosiy σ2=0.03) – Point | 9.17 | 40.84 | 45.83 | 16.75 | 2.01 | 10.10 | 0.363 |
| Ours(Point) (Nosiy σ2=0.03) | 4.91 | 68.99 | 61.51 | 3.96 | 0.81 | 2.86 | 0.283 |
| DeepCAD [1] (Nosiy σ2=0.05) – Point | 10.82 | 32.69 | 44.02 | 14.70 | 2.44 | 13.54 | 1.230 |
| Ours(Point) (Nosiy σ2=0.05) | 5.50 | 63.76 | 57.03 | 3.88 | 0.99 | 3.51 | 0.199 |
| DeepCAD [1] (Partial Elim 20%) – Point | 4.71 | 71.47 | 64.63 | 7.64 | 1.34 | 6.03 | 0.281 |
| Ours(Point) (Partial Elim 20%) | 2.75 | 84.79 | 73.44 | 2.17 | 0.36 | 1.89 | 0.138 |
| DeepCAD [1] (Partial Elim 50%) – Point | 4.70 | 71.40 | 64.19 | 8.88 | 1.41 | 5.33 | 0.138 |
| Ours(Point) (Partial Elim 50%) | 2.82 | 83.37 | 72.69 | 2.14 | 0.45 | 1.67 | 0.025 |
| DeepCAD [1] (Partial Elim 80%) – Point | 5.96 | 62.32 | 58.40 | 12.51 | 1.44 | 7.54 | 0.462 |
| Ours(Point) (Partial Elim 80%) | 2.99 | 82.82 | 71.90 | 2.43 | 0.66 | 1.74 | 0.086 |
| DeepCAD [1] (Partial Elim 95%) – Point | 8.39 | 44.86 | 47.70 | 18.28 | 1.73 | 7.75 | 0.560 |
| Ours(Point) (Partial Elim 95%) | 3.68 | 76.73 | 65.43 | 2.44 | 0.71 | 1.92 | 0.040 |
| DeepCAD [1] (Partial Elim 99%) – Point | 10.62 | 34.02 | 44.14 | 7.71 | 1.32 | 7.54 | 0.323 |
| Ours(Point) (Partial Elim 99%) | 5.86 | 60.08 | 54.07 | 2.83 | 0.26 | 1.60 | 0.005 |
- 我方(点云)在干净与有噪声测试中,在 Chamfer 更低、F-score 更高。
- 在噪声 sigma2=0.01–0.05 下,CAD-MLLM 在 Chamfer、F-score、Normal C、SegE、DangEL、SIR、FluxEE 等指标上均显著优于 DeepCAD。
- 该方法在所有报告的场景中均展示鲁棒性提升,表明对数据缺陷具有更强的韧性。
- 如图 1 所示,使用可视化的多模态数据(文本、8 视图图像和点云)来条件化 CAD 生成。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。