QUICK REVIEW

[论文解读] CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

Jiasheng Xu, Chenyu Wang|arXiv (Cornell University)|Nov 7, 2024

Manufacturing Process and Optimization被引用 6

一句话总结

论文提出 CAD-MLLM，是一个基于多模态、使用 MLLM 条件化的多模态 CAD 生成框架，并在各种数据缺陷下展示了相对于 DeepCAD 的鲁棒性优势。

ABSTRACT

This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model. It contains approximately 450K instances and their CAD construction sequences. To thoroughly evaluate the quality of our generated CAD models, we go beyond current evaluation metrics that focus on reconstruction quality by introducing additional metrics that assess topology quality and surface enclosure extent. Extensive experimental results demonstrate that CAD-MLLM significantly outperforms existing conditional generative methods and remains highly robust to noises and missing points. The project page and more visualizations can be found at: https://cad-mllm.github.io/

研究动机与目标

在多模态条件下推动鲁棒的 CAD 生成（文本、图像和点云）。
使用 MLLM 将多模态条件统一用于 CAD 模型生成。
相对于基线评估在有噪声和部分缺失点云条件下的鲁棒性。

提出的方法

使用多模态受条件的输入（文本、多视图图像和点云）通过 MLLM 指导 CAD 生成。
在干净和扰动数据上定量比较相对于 DeepCAD 的鲁棒性。
报告包括 Chamfer、F-score、Normal C、SegE、DangEL、SIR 和 FluxEE 等指标。
展示在有噪声和部分点云测试上的评估，以显示鲁棒性提升。

Figure 1 : Dataset sample visualization. We sample five cases from our proposed Omni-CAD dataset to illustrate the multimodal conditioned data and the corresponding ground truth CAD models. In the real dataset, each CAD model includes images of eight views; here, we randomly select four views for de

实验结果

研究问题

RQ1基于多模态、MLLM 的方法能否在嘈杂与部分数据条件下提升 CAD 生成的鲁棒性？
RQ2在多项鲁棒性指标与数据缺陷下，CAD-MLLM 相对于 DeepCAD 的表现如何？
RQ3融入文本、图像、点云等多样模态是否提升 CAD 模型重构质量？

主要发现

表/情境	Chamfer	F-score	Normal C	SegE ↓	DangEL ↓	SIR	FluxEE
DeepCAD [1] (Clean) – Point	4.63	71.47	64.47	9.47	1.32	6.35	0.375
Ours(Point) (Clean)	2.88	83.10	72.66	2.22	0.64	2.02	0.066
DeepCAD [1] (Nosiy σ2=0.01) – Point	6.71	55.97	53.34	9.27	1.38	8.97	0.227
Ours(Point) (Nosiy σ2=0.01)	3.12	82.05	71.11	2.21	0.70	1.85	0.025
DeepCAD [1] (Nosiy σ2=0.02) – Point	8.15	46.67	49.64	16.99	1.94	7.63	0.511
Ours(Point) (Nosiy σ2=0.02)	4.14	74.39	65.66	2.31	0.51	1.82	0.049
DeepCAD [1] (Nosiy σ2=0.03) – Point	9.17	40.84	45.83	16.75	2.01	10.10	0.363
Ours(Point) (Nosiy σ2=0.03)	4.91	68.99	61.51	3.96	0.81	2.86	0.283
DeepCAD [1] (Nosiy σ2=0.05) – Point	10.82	32.69	44.02	14.70	2.44	13.54	1.230
Ours(Point) (Nosiy σ2=0.05)	5.50	63.76	57.03	3.88	0.99	3.51	0.199
DeepCAD [1] (Partial Elim 20%) – Point	4.71	71.47	64.63	7.64	1.34	6.03	0.281
Ours(Point) (Partial Elim 20%)	2.75	84.79	73.44	2.17	0.36	1.89	0.138
DeepCAD [1] (Partial Elim 50%) – Point	4.70	71.40	64.19	8.88	1.41	5.33	0.138
Ours(Point) (Partial Elim 50%)	2.82	83.37	72.69	2.14	0.45	1.67	0.025
DeepCAD [1] (Partial Elim 80%) – Point	5.96	62.32	58.40	12.51	1.44	7.54	0.462
Ours(Point) (Partial Elim 80%)	2.99	82.82	71.90	2.43	0.66	1.74	0.086
DeepCAD [1] (Partial Elim 95%) – Point	8.39	44.86	47.70	18.28	1.73	7.75	0.560
Ours(Point) (Partial Elim 95%)	3.68	76.73	65.43	2.44	0.71	1.92	0.040
DeepCAD [1] (Partial Elim 99%) – Point	10.62	34.02	44.14	7.71	1.32	7.54	0.323
Ours(Point) (Partial Elim 99%)	5.86	60.08	54.07	2.83	0.26	1.60	0.005

我方（点云）在干净与有噪声测试中，在 Chamfer 更低、F-score 更高。
在噪声 sigma2=0.01–0.05 下，CAD-MLLM 在 Chamfer、F-score、Normal C、SegE、DangEL、SIR、FluxEE 等指标上均显著优于 DeepCAD。
该方法在所有报告的场景中均展示鲁棒性提升，表明对数据缺陷具有更强的韧性。
如图 1 所示，使用可视化的多模态数据（文本、8 视图图像和点云）来条件化 CAD 生成。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。