QUICK REVIEW

[论文解读] Let Images Give You More:Point Cloud Cross-Modal Training for Shape Analysis

Yan Xu, Heshen Zhan|arXiv (Cornell University)|Oct 9, 2022

3D Surveying and Cultural Heritage被引用 22

一句话总结

PointCMT 引入一种教师-学生跨模态训练框架，在训练过程中将图像先验知识蒸馏到点云模型中，提升仅点云的形状分析能力，同时不改变架构。在 ModelNet40 和 ScanObjectNN 上实现了最新的性能提升。

ABSTRACT

Although recent point cloud analysis achieves impressive progress, the paradigm of representation learning from a single modality gradually meets its bottleneck. In this work, we take a step towards more discriminative 3D point cloud representation by fully taking advantages of images which inherently contain richer appearance information, e.g., texture, color, and shade. Specifically, this paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy, which utilizes view-images, i.e., rendered or projected 2D images of the 3D object, to boost point cloud analysis. In practice, to effectively acquire auxiliary knowledge from view images, we develop a teacher-student framework and formulate the cross modal learning as a knowledge distillation problem. PointCMT eliminates the distribution discrepancy between different modalities through novel feature and classifier enhancement criteria and avoids potential negative transfer effectively. Note that PointCMT effectively improves the point-only representation without architecture modification. Sufficient experiments verify significant gains on various datasets using appealing backbones, i.e., equipped with PointCMT, PointNet++ and PointMLP achieve state-of-the-art performance on two benchmarks, i.e., 94.4% and 86.7% accuracy on ModelNet40 and ScanObjectNN, respectively. Code will be made available at https://github.com/ZhanHeshen/PointCMT.

研究动机与目标

利用来自图像的丰富外观信息来克服单模态点云学习的瓶颈。
开发一个跨模态知识蒸馏框架，在训练过程中将图像先验传递到点云模型。
在不改变推理时点云模型架构的前提下实现改进。

提出的方法

采用教师-学生设置，其中图像编码器和分类器作为教师，点云编码器作为学生。
为每个3D对象生成多视图图像（渲染或投影），以获得基于图像的全局特征。
引入一个跨模态点生成器（CMPG），将图像特征映射到点云风格的表示；CMPG 使用地球搬运距离（Earth Mover’s Distance，EMD）进行预训练以重建点云。
应用三个训练目标：基于图像的分类损失、通过图像和点派生重建之间的EMD实现的特征增强损失、以及通过 KL 散度对齐 logits 的分类器增强损失。
将最终损失设为交叉熵、特征和分类器损失的加权和（权重：α=30，β=0.3）。
将 PointCMT 与任意点云模型集成，在推理阶段不需要架构更改，以实现改进。

实验结果

研究问题

RQ1在不修改推理架构的前提下，来自图像的先验是否能在训练期间改善点云判别表示？
RQ2跨模态知识蒸馏应如何表述，以避免异质模态（图像与点云）之间的负迁移？
RQ3不同视图图像生成策略对跨模态传递效果有何影响？
RQ4数据效率和消融选择如何影响 PointCMT 在标准3D基准上的收益？

主要发现

PointCMT 在基线上的提升显著；例如 PointNet++ 配合 PointCMT 在 ModelNet40 上达到 OA 94.4%，基线为 93.4%（≈+1.0 百分点）。
在 ScanObjectNN 上，PointNet++ 配合 PointCMT 在 PB_T50_RS 上达到 OA 83.3%（+3.9），在 OBJ_ONLY 上的 mAcc 为 91.8%（+4.3）。
在某些设置下，PointMLP 配合 PointCMT 将 OA 提升至 86.4%（在 PB_T50_RS 上 +1.0）以及 mAcc 提升至 92.0%（在 OBJ_ONLY 上 +2.6）。
PointCMT 在数据有限时提供更显著的增益；使用仅 2% 和 10% 的训练数据时，PointNet++ 配合 PointCMT 的 OA 提升大约为 +1.9 至 +2.8 点。
消融表明，将特征增强（FE）与分类器增强（CE）结合可以获得最佳结果（ModelNet40 OA 94.4%，ScanObjectNN OBJ_ONLY 83.3%）。
Compared to standard KD methods, PointCMT’s cross-modal approach avoids negative transfer and outperforms baselines on the tested benchmarks.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。