[Paper Review] Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
The paper introduces class-balanced sampling and grouping with a multi-group head to tackle long-tailed class distributions in nuScenes, achieving state-of-the-art lidar-based 3D object detection results. It combines DS Sampling, GT-AUG, and a balanced multi-group head to boost tail-class performance.
This report presents our method which wins the nuScenes3D Detection Challenge [17] held in Workshop on Autonomous Driving(WAD, CVPR 2019). Generally, we utilize sparse 3D convolution to extract rich semantic features, which are then fed into a class-balanced multi-head network to perform 3D object detection. To handle the severe class imbalance problem inherent in the autonomous driving scenarios, we design a class-balanced sampling and augmentation strategy to generate a more balanced data distribution. Furthermore, we propose a balanced group-ing head to boost the performance for the categories withsimilar shapes. Based on the Challenge results, our methodoutperforms the PointPillars [14] baseline by a large mar-gin across all metrics, achieving state-of-the-art detection performance on the nuScenes dataset. Code will be released at CBGS.
Motivation & Objective
- Address severe class imbalance in nuScenes 3D object detection.
- Improve tail-class performance while maintaining overall accuracy.
- Leverage multi-group head design to share information among similar-shaped categories.
- Enhance data augmentation and training procedures to boost joint multi-class detection.
Proposed method
- Use sparse 3D convolutions for feature extraction from voxelized point clouds.
- Introduce DS Sampling to balance the training distribution by duplicating samples from rare classes.
- Apply GT-AUG to augment data by pasting ground-trtruth boxes sampled from an annotation database.
- Design a multi-group head where each group of similar-shape classes shares a dedicated head to reduce inter-class interference.
- Group classes into six groups based on shape/size similarity and instance balance to guide the multi-group head learning.
- Incorporate loss components including weighted focal loss for classification, smooth-L1 for regression, and orientation classification with offset to reduce angular ambiguity.
Experimental results
Research questions
- RQ1How does class imbalance affect 3D object detection performance on nuScenes, especially for tail classes?
- RQ2Can a class-balanced sampling strategy improve tail-class accuracy without sacrificing head-class performance?
- RQ3Does grouping similar-shaped categories and using group-specific heads improve multi-class detection in point clouds?
- RQ4What combination of data augmentation, loss design, and network architecture yields state-of-the-art lidar-based 3D detection on nuScenes?
Key findings
- DS Sampling expands the training set from 28,130 to 128,100 samples, smoothing the class distribution.
- The proposed 6-group arrangement (Car), (Truck, Construction Vehicle), (Bus, Trailer), (Barrier), (Motorcycle, Bicycle), (Pedestrian, Traffic Cone) improves tail-class performance.
- The method achieves state-of-the-art results on the nuScenes lidar track with mAP and NDS metrics; reported gains include mAP improvements over PointPillars and competitive NDS.
- GT-AUG and Res-Encoder contribute notably to mAP, as shown in ablation studies.
- Final submission reported mAP of 53.2% and NDS of 63.78% on the validation split, surpassing the baselines.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.