QUICK REVIEW

[论文解读] 3D-A-Nets: 3D Deep Dense Descriptor for Volumetric Shapes with Adversarial Networks

Mengwei Ren, Liang Niu|arXiv (Cornell University)|Nov 28, 2017

3D Shape Modeling and Analysis参考文献 33被引用 25

一句话总结

该论文提出3D-A-Nets，一种新颖的3D对抗性网络，通过3D体素的多层密集表示（MDR）学习体积形状的深度密集形状描述符（3D-DDSD）。通过联合训练CNN-RNN生成器与对抗性判别器，该模型在ModelNet40数据集上实现了3D形状分类（90.5%准确率）和检索（mAP 0.801）的最先进性能，显著优于以往基于体素的方法。

ABSTRACT

Recently researchers have been shifting their focus towards learned 3D shape descriptors from hand-craft ones to better address challenging issues of the deformation and structural variation inherently present in 3D objects. 3D geometric data are often transformed to 3D Voxel grids with regular format in order to be better fed to a deep neural net architecture. However, the computational intractability of direct application of 3D convolutional nets to 3D volumetric data severely limits the efficiency (i.e. slow processing) and effectiveness (i.e. unsatisfied accuracy) in processing 3D geometric data. In this paper, powered with a novel design of adversarial networks (3D-A-Nets), we have developed a novel 3D deep dense shape descriptor (3D-DDSD) to address the challenging issues of efficient and effective 3D volumetric data processing. We developed new definition of 2D multilayer dense representation (MDR) of 3D volumetric data to extract concise but geometrically informative shape description and a novel design of adversarial networks that jointly train a set of convolution neural network (CNN), recurrent neural network (RNN) and an adversarial discriminator. More specifically, the generator network produces 3D shape features that encourages the clustering of samples from the same category with correct class label, whereas the discriminator network discourages the clustering by assigning them misleading adversarial class labels. By addressing the challenges posed by the computational inefficiency of direct application of CNN to 3D volumetric data, 3D-A-Nets can learn high-quality 3D-DSDD which demonstrates superior performance on 3D shape classification and retrieval over other state-of-the-art techniques by a great margin.

研究动机与目标

解决直接将3D CNN应用于体积数据时计算效率低下且准确率有限的问题。
学习对形变不变、具有几何信息的3D形状描述符，以在3D物体的结构变化中实现良好泛化。
通过对抗性训练与时空特征建模，提升3D形状分类与检索性能。
开发一种紧凑但信息丰富的3D体素网格2D多层密集表示（MDR），以实现高效特征提取。
将CNN、RNN与对抗性训练整合到统一框架中，实现鲁棒的3D形状描述符学习。

提出的方法

该方法引入一种2D多层密集表示（MDR），将3D体素网格投影为一系列2D切片，以支持高效的CNN处理。
CNN-RNN生成器网络从MDR切片中提取分层特征，其中ConvLSTM用于建模相邻切片之间的时空依赖关系。
对抗性判别器被训练以将同一类别的真实特征误分类为不同类别，从而迫使生成器学习更具判别性的特征。
生成器与判别器通过对抗方式联合训练，通过类别标签增强特征聚类，提升泛化能力。
模型采用3切片MDR配置，经实证验证可在模型复杂度与性能之间取得良好平衡。
最终的3D-DDSD从生成器中提取，并用于下游任务如分类与检索。

实验结果

研究问题

RQ1对抗性训练能否提升从体积数据中学习的3D形状描述符的判别能力？
RQ2RNN在建模MDR切片间空间关系方面，对3D形状表征的有效性如何？
RQ3与3D CNN相比，2D MDR表示能否在降低计算成本的同时实现高性能？
RQ4所提出的3D-A-Nets框架在多大程度上优于现有基于体素的3D形状分类与检索方法？
RQ5在模型效率与性能之间取得平衡时，MDR切片的最优数量是多少？

主要发现

所提出的3D-A-Nets在ModelNet40基准上实现了90.5%的分类准确率，显著优于先前最先进方法VoxNet（83%）。
该模型在3D形状检索任务中实现了0.801的mAP，远高于3D ShapeNets（mAP 0.492）和3D-GAN（未报告）。
消融实验表明，仅使用对抗性学习即可将准确率从85.6%（仅CNN）提升至88.1%，证明其在性能提升中的关键作用。
与仅使用CNN相比（85.6%），RNN组件使准确率提升了0.6%（87.5%），证实其在建模时空特征相关性方面的价值。
精确率-召回率曲线对比显示，3D-A-Nets在所有召回水平下均显著优于3D ShapeNets。
该模型在大多数情况下成功检索到正确对象，但部分混淆出现在视觉上相似的类别之间，如书桌与床头柜。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。