QUICK REVIEW

[논문 리뷰] Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

Xumin Yu, Lulu Tang|arXiv (Cornell University)|2021. 11. 29.

3D Shape Modeling and Analysis참고 문헌 52인용 수 49

한 줄 요약

Point-BERT는 Masked Point Modeling 과 dVAE를 통해 학습된 이산 포인트 토큰 어휘를 사용하여 3D 포인트 클라우드 트랜스포머를 사전학습하고, ModelNet40과 ScanObjectNN에서 강력한 성능을 달성하며 새로운 작업에 대한 좋은 전달성을 가능하게 한다.

ABSTRACT

We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT to 3D point cloud. Inspired by BERT, we devise a Masked Point Modeling (MPM) task to pre-train point cloud Transformers. Specifically, we first divide a point cloud into several local point patches, and a point cloud Tokenizer with a discrete Variational AutoEncoder (dVAE) is designed to generate discrete point tokens containing meaningful local information. Then, we randomly mask out some patches of input point clouds and feed them into the backbone Transformers. The pre-training objective is to recover the original point tokens at the masked locations under the supervision of point tokens obtained by the Tokenizer. Extensive experiments demonstrate that the proposed BERT-style pre-training strategy significantly improves the performance of standard point cloud Transformers. Equipped with our pre-training strategy, we show that a pure Transformer architecture attains 93.8% accuracy on ModelNet40 and 83.1% accuracy on the hardest setting of ScanObjectNN, surpassing carefully designed point cloud models with much fewer hand-made designs. We also demonstrate that the representations learned by Point-BERT transfer well to new tasks and domains, where our models largely advance the state-of-the-art of few-shot point cloud classification task. The code and pre-trained models are available at https://github.com/lulutang0608/Point-BERT

연구 동기 및 목표

BERT 스타일의 사전학습을 3D 포인트 클라우드에 최소한의 귀납 편향으로 확장하는 것을 목표로 한다.
로컬 포인트 패치를 이산 토큰으로 변환하는 토큰화 메커니즘을 개발한다.
마스킹된 토큰을 복원하기 위한 Masked Point Modeling 사전학습 목표를 제안한다.
높은 수준의 의미를 포착하기 위해 보조 대조적 목적을 사용하여 표현력을 강화한다.
포인트 클라우드 태스크에 대해 강력한 전달, 소수 샷, 그리고 실제 세계 성능 향상을 입증한다.

제안 방법

FPS와 kNN 그룹화를 통해 3D 포인트 클라우드를 로컬 패치(서브-클라우드)로 분할한다.
서브-클라우드를 미니-포인트넷으로 임베딩에 투영하고 패치 임베딩의 시퀀스를 형성한다.
이산 VAE(dVAE)를 사용해 임베딩을 이산 포인트 토큰으로 변환하는 Tokenizer를 학습한다.
dVAE 감독 아래 패치를 마스킹하고 토큰을 재구성하여 Masked Point Modeling으로 Transformer 백본을 사전학습한다.
블록 단위 마스킹 전략을 적용하고 사전학습 중 학습 가능한 마스크 토큰을 사용한다.
Point Patch Mixing이 포함된 MoCo 기반 대조 손실을 도입해 고수준의 의미 표현을 촉진한다.

실험 결과

연구 질문

RQ1디스크리트 토큰을 사용하여 BERT 스타일의 사전학습 목표를 3D 포인트 클라우드에 효과적으로 적용할 수 있는가?
RQ2dVAE로 학습된 이산 포인트 토큰이 표현 학습에 의미 있는 로컬 기하 패턴을 포착하는가?
RQ3대조 학습과 패치 믹싱의 도움을 받아 Masked Point Modeling이 초기 학습 대비 하위 태스크 성능을 향상시키는가?
RQ4Point-BERT 표현이 실제 세계 데이터셋 및 소수 샷 시나리오로 얼마나 잘 전달되는가?

주요 결과

Point-BERT는 입력 포인트 수를 늘려 ModelNet40에서 93.8% 정확도를 달성하며, 여러 수작업 설계 및 트랜스포머 기반 기준선을 능가한다.
도전적인 ScanObjectNN 설정에서 Point-BERT는 83.1%의 정확도에 도달하여, 이전 모델보다 수작업 편향이 덜한 상태에서 우수한 성능을 보인다.
Point-BERT로 사전학습하면 항상 트랜스포머 성능이 초기 학습 대비 향상되고 입력 밀도에 따라 확장된다(예: 4096 pts에서 93.4%, 8192 pts에서 93.8%).
Point-BERT 표현은 새로운 태스크와 도메인으로 잘 전달되어 소수-shot 포인트 클라우드 분류에서 최신 기술을 앞당긴다.
Ablation 연구에서 MPM, Point Patch Mixing, MoCo의 조합이 가장 강력한 성능 향상을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.