QUICK REVIEW

[論文レビュー] CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

Jiasheng Xu, Chenyu Wang|arXiv (Cornell University)|Nov 7, 2024

Manufacturing Process and Optimization被引用数 6

ひとこと要約

本論文は CAD-MLLM を提案する。これは MLLM を用いて複数のモダリティに条件付けされたマルチモーダル CAD 生成フレームワークであり、さまざまなデータ欠陥に対して DeepCAD に対する頑健性の優位性を示している。

ABSTRACT

This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model. It contains approximately 450K instances and their CAD construction sequences. To thoroughly evaluate the quality of our generated CAD models, we go beyond current evaluation metrics that focus on reconstruction quality by introducing additional metrics that assess topology quality and surface enclosure extent. Extensive experimental results demonstrate that CAD-MLLM significantly outperforms existing conditional generative methods and remains highly robust to noises and missing points. The project page and more visualizations can be found at: https://cad-mllm.github.io/

研究の動機と目的

テキスト、画像、点群といったマルチモーダル条件下でのロバストな CAD 生成を動機づける。
MLLM を用いた CAD モデル生成のためのマルチモーダル条件付けを統一する。
ベースラインに対してノイズのあるおよび部分的に欠損した点群に対する頑健性を評価する。

提案手法

テキスト、複数ビューの画像、および点群といったマルチモーダル条件付き入力を用いて、MLLM による CAD 生成を導く。
クリーンデータと摂動データを横断して DeepCAD との頑健性を定量的に比較する。
Chamfer、F-score、Normal Consistence、Segmentation Error、DangEL、SIR、FluxEE などの指標を報告する。
ノイズ付きおよび部分欠損点群テストで頑健性の改善を示す評価を提示する。

Figure 1 : Dataset sample visualization. We sample five cases from our proposed Omni-CAD dataset to illustrate the multimodal conditioned data and the corresponding ground truth CAD models. In the real dataset, each CAD model includes images of eight views; here, we randomly select four views for de

実験結果

リサーチクエスチョン

RQ1ノイズのあるデータおよび部分欠損データ条件下で、マルチモーダルかつ MLLM ベースのアプローチは CAD 生成の頑健性を改善できるか？
RQ2複数の頑健性指標とデータ欠陥に対して、CAD-MLLM は DeepCAD と比較してどのように性能を発揮するか？
RQ3多様なモダリティ（テキスト、画像、点群）を取り入れることは CAD モデルの再構成品質を改善するか？

主な発見

表/背景	Chamfer	F-score	Normal C	SegE ↓	DangEL ↓	SIR	FluxEE
DeepCAD [1] (Clean) – Point	4.63	71.47	64.47	9.47	1.32	6.35	0.375
Ours(Point) (Clean)	2.88	83.10	72.66	2.22	0.64	2.02	0.066
DeepCAD [1] (Nosiy σ2=0.01) – Point	6.71	55.97	53.34	9.27	1.38	8.97	0.227
Ours(Point) (Nosiy σ2=0.01)	3.12	82.05	71.11	2.21	0.70	1.85	0.025
DeepCAD [1] (Nosiy σ2=0.02) – Point	8.15	46.67	49.64	16.99	1.94	7.63	0.511
Ours(Point) (Nosiy σ2=0.02)	4.14	74.39	65.66	2.31	0.51	1.82	0.049
DeepCAD [1] (Nosiy σ2=0.03) – Point	9.17	40.84	45.83	16.75	2.01	10.10	0.363
Ours(Point) (Nosiy σ2=0.03)	4.91	68.99	61.51	3.96	0.81	2.86	0.283
DeepCAD [1] (Nosiy σ2=0.05) – Point	10.82	32.69	44.02	14.70	2.44	13.54	1.230
Ours(Point) (Nosiy σ2=0.05)	5.50	63.76	57.03	3.88	0.99	3.51	0.199
DeepCAD [1] (Partial Elim 20%) – Point	4.71	71.47	64.63	7.64	1.34	6.03	0.281
Ours(Point) (Partial Elim 20%)	2.75	84.79	73.44	2.17	0.36	1.89	0.138
DeepCAD [1] (Partial Elim 50%) – Point	4.70	71.40	64.19	8.88	1.41	5.33	0.138
Ours(Point) (Partial Elim 50%)	2.82	83.37	72.69	2.14	0.45	1.67	0.025
DeepCAD [1] (Partial Elim 80%) – Point	5.96	62.32	58.40	12.51	1.44	7.54	0.462
Ours(Point) (Partial Elim 80%)	2.99	82.82	71.90	2.43	0.66	1.74	0.086
DeepCAD [1] (Partial Elim 95%) – Point	8.39	44.86	47.70	18.28	1.73	7.75	0.560
Ours(Point) (Partial Elim 95%)	3.68	76.73	65.43	2.44	0.71	1.92	0.040
DeepCAD [1] (Partial Elim 99%) – Point	10.62	34.02	44.14	7.71	1.32	7.54	0.323
Ours(Point) (Partial Elim 99%)	5.86	60.08	54.07	2.83	0.26	1.60	0.005

Ours(Point) はクリーンおよびノイズのあるテストで DeepCAD よりも Chamfer が小さく、F-score が高い。
sigma2=0.01–0.05 のノイズ下で、Chamfer、F-score、Normal C、SegE、DangEL、SIR、FluxEE のすべての指標で CAD-MLLM が DeepCAD よりはるかに良好な指標を維持する。
本手法は報告されたすべてのシナリオにおいて頑健性の向上を示し、データ欠陥に対する耐性が強いことを示している。
可視化されたマルチモーダルデータ（テキスト、8 つのビュー画像、点群）が CAD 生成を条件付けするために用いられており、Fig. 1 に示されている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。