QUICK REVIEW

[논문 리뷰] Point-Voxel CNN for Efficient 3D Deep Learning

Zhijian Liu, Haotian Tang|arXiv (Cornell University)|2019. 07. 08.

3D Shape Modeling and Analysis참고 문헌 49인용 수 335

한 줄 요약

PVCNN은 저해상도 복셀 기반 분기와 고해상도 포인트 기반 분기를 결합하여 정확도를 유지하면서도 빠르고 메모리 효율적인 3D 딥러닝을 달성합니다.

ABSTRACT

We present Point-Voxel CNN (PVCNN) for efficient, fast 3D deep learning. Previous work processes 3D data using either voxel-based or point-based NN models. However, both approaches are computationally inefficient. The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution. As for point-based networks, up to 80% of the time is wasted on structuring the sparse data which have rather poor memory locality, not on the actual feature extraction. In this paper, we propose PVCNN that represents the 3D input data in points to reduce the memory consumption, while performing the convolutions in voxels to reduce the irregular, sparse data access and improve the locality. Our PVCNN model is both memory and computation efficient. Evaluated on semantic and part segmentation datasets, it achieves much higher accuracy than the voxel-based baseline with 10x GPU memory reduction; it also outperforms the state-of-the-art point-based models with 7x measured speedup on average. Remarkably, the narrower version of PVCNN achieves 2x speedup over PointNet (an extremely efficient model) on part and scene segmentation benchmarks with much higher accuracy. We validate the general effectiveness of PVCNN on 3D object detection: by replacing the primitives in Frustrum PointNet with PVConv, it outperforms Frustrum PointNet++ by 2.4% mAP on average with 1.5x measured speedup and GPU memory reduction.

연구 동기 및 목표

메모리 및 대기 시간 제약으로 인해 엣지 디바이스에서 효율적인 3D 딥러닝의 필요성을 제시한다.
메모리 풋프린트를 줄이고 데이터 로컬리티를 향상시키기 위해 복셀 기반 처리와 포인트 기반 처리를 융합한 하이브리드 PVConv 프리미티브를 제안한다.
PVCNN이 순수 복셀 또는 순수 포인트 모델에 비해 여러 3D 작업에서 더 높은 정확도와 더 낮은 메모리 사용 및 대기 시간을 달성한다는 것을 입증한다.

제안 방법

두 개 분기로 구성된 Point-Voxel Convolution (PVConv)을 도입한다: 거친 이웃집계용 복셀 기반 분기와 미세한 특징을 위한 고해상도 포인트 기반 분기.
복셀 기반 분기는 정규화된 포인트를 저해상도 격자로 복셀화하고 3D 합성곱을 적용한 다음 삼선 보간법으로 devoxelization하여 포인트 특징과 융합한다.
포인트 기반 분기는 원래 포인트에 대해 MLP로 처리하여 고해상도 각 포인트 정보를 보존한다.
두 분기에서 얻은 특징을 간단한 덧셈으로 융합하여 최종 포인트 특징을 얻는다.
좌표를 정규화하고 역전파 가능한 voxelization/devoxelization을 수행하여 엔드 투 엔드 학습을 가능하게 한다.

실험 결과

연구 질문

RQ1일반적인 3D 작업(세그먼트, 탐지)에서 정확도를 희생하지 않고 3D 데이터를 어떻게 효율적으로 처리할 수 있는가?
RQ2복합 복셀-포인트 접근 방식이 순수 복셀 또는 순수 포인트 방법에 비해 메모리 풋프린트와 데이터 로컬리티를 개선하는가?
RQ3ShapeNet Part, S3DIS, KITTI 벤치마크에서 PVCNN의 성능(정확도, 지연 시간, 메모리)은 어떠한가?

주요 결과

입력 데이터	합성곱	평균 IoU / mAcc / mIoU (표에 따라 다름)	지연 시간	GPU 메모리
Points (8 × 2048)	volumetric	86.2 IoU	50.7 ms	1.59 GB
Points (8 × 2048)	volumetric	85.7 IoU	36.8 ms	1.56 GB
Points (8 × 2048)	volumetric	85.5 IoU	28.9 ms	1.55 GB
Points (8 × 2048)	volumetric	85.2 IoU	11.6 ms	0.80 GB
Points (8 × 2048)	volumetric	85.5 IoU	21.7 ms	1.00 GB

PVCNN은 복셀 기준선보다 더 높은 정확도를 달성하면서 GPU 메모리를 크게 절감합니다(ShapeNet Part에서 약 10배의 메모리 감소).
PVCNN은 테스트된 작업들에서 평균적으로 최첨단 포인트 기반 모델에 비해 약 7배의 속도 증가를 달성합니다.
제한된 PVCNN 변형은 강력한 기준 모델(예: PointNet, SpiderCNN)에 비해 2배에서 15배의 속도 향상을 달성하고 경쟁력 있거나 더 높은 정확도를 보입니다.
ShapeNet Part에서 PVCNN 변형은 정확도-지연-메모리의 우호적 trade-off를 보이며, 예를 들어 1xC 변형은 86.2 mIoU, 50.7 ms 레이턴시, 1.59 GB 메모리를 달성합니다.
S3DIS 실내 장면 세그먼트에서 PVCNN 및 PVCNN++가 순수 포인트 기반 모델을 능가하며 최대 8x 속도 증가와 3배 메모리 감소를 달성합니다; PVCNN++는 더 낮은 레이턴시로 PointCNN을 능가합니다.
3D 물체 탐지(KITTI)의 경우 PVCNN 변형은 F-PointNet++보다 측정 속도가 1.5배 빠르고 메모리 감소를 달성하며, 전체 PVCNN은 주목할 만한 mAP 향상을 제공합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.