QUICK REVIEW

[논문 리뷰] PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction

Sicheng Zuo, Wenzhao Zheng|arXiv (Cornell University)|2023. 08. 31.

3D Shape Modeling and Analysis인용 수 8

한 줄 요약

PointOcc 도입 Cylindrical Tri-Perspective View (Cylindrical TPV)와 2D 이미지 백본으로 LiDAR 포인트 클라우드를 처리하여 밀집 3D 시맨틱 점유를 예측하며, LiDAR만으로 최첨단 성능을 달성하고 속도도 더 빠름.

ABSTRACT

Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they can only describe a subspace of the 3D scene. To address this, we propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system for more fine-grained modeling of nearer areas. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane. Finally, we obtain the features of each point by aggregating its projected features on each of the processed TPV planes without the need for any post-processing. Extensive experiments on both 3D occupancy prediction and LiDAR segmentation benchmarks demonstrate that the proposed PointOcc achieves state-of-the-art performance with much faster speed. Specifically, despite only using LiDAR, PointOcc significantly outperforms all other methods, including multi-modal methods, with a large margin on the OpenOccupancy benchmark. Code: https://github.com/wzzheng/PointOcc.

연구 동기 및 목표

무거운 3D 컨볼루션 없이 밀집형 3D 시맨틱 점유 예측의 정보를 동기시키는 것.
LiDAR 포인트 밀도 분포에 더 잘 맞도록 Cylindrical TPV를 제안하는 것.
2D 백본과 공유 TPV 인코딩–디코딩 사용으로 효율적 처리 가능하게 하는 것.
후처리 없이 고해상도 3D 점유 및 LiDAR 분할 결과를 제공하는 프레임워크를 제공하는 것.

제안 방법

LiDAR 포인트를 Cylindrical TPV 평면으로 변환하기 위해 cylindrical partition 및 spatial group pooling을 사용하여 3D 구조를 보존합니다.
각 TPV 평면을 공유 2D 백본과 FPN으로 인코딩하여 다중 스케일 특징을 추출합니다.
세 개의 TPV 평면에 세워진 특징으로 포인트/복셀마다 특징을 질의하고 보간된 특징을 합산합니다.
후처리 없이 의미 점유/세그먼트를 위한 간단한 2계층 MLP 헤드를 사용합니다.

실험 결과

연구 질문

RQ1무거운 3D 컨볼루션 없이 LiDAR 포인트 클라우드를 밀집 3D 시맨틱 점유 예측에 효과적으로 표현하려면 어떻게 해야 합니까?
RQ2Cylindrical TPV가 Cartesian TPV나 단일 뷰 프로젝션보다 근거리 세부 정보와 전체 3D 구조를 더 잘 포착합니까?
RQ3TPV 특징을 입력했을 때 이미지에서 사전 학습된 2D 백본이 고품질 3D 시맨틱 예측을 효율적으로 제공할 수 있습니까?
RQ4TPV 해상도, 그룹화 크기, 계산량과 정확도 간의 트레이드오프는 무엇입니까?
RQ5OpenOccupancy 및 LiDAR 세그먼트 벤치마크에서 PointOcc의 성능은 보셀 기반 및 다른 2D 프로젝션 방법과 비교하여 어떻습니까?

주요 결과

PointOcc는 LiDAR만을 사용하여 OpenOccupancy에서 최첨단 성능을 달성하고 다중모달 방법을 상당한 차이로 능가합니다(OpenOccupancy 검증에서 mIoU 23.9, IoU 34.1).
nuScenes LiDAR 분할에서 PointOcc는 모든 2D 프로젝션 기반 방법을 능가하고 보셀 기반 방법과 경쟁력이 있습니다(예: ImageNet-1K 사전학습 ViT 백본 사용 시 mIoU 77.9).
세 가지 TPV 평면(HW, WD, DH)을 모두 결합할 때 최상의 결과가 나오며, 평면 간 보완 정보를 시사합니다.
더 높은 TPV 해상도가 더 나은 성능을 낳고, 공간 그룹 풀링(K=16)은 관리 가능한 비용으로 구조적 세부 정보를 보존합니다.
ImageNet-1K/21K에서 사전학습된 ViT 백본을 사용하면 성능이 향상되고, 일부 ViT 가중치를 고정시키면 LiDAR 분할 중 높은 정확도를 유지할 수 있습니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.