[Paper Review] Boosting Randomized Smoothing with Variance Reduced Classifiers
This paper proposes a variance-reduced ensemble method for Randomized Smoothing (RS) that significantly boosts certified robustness by leveraging model ensembles as base classifiers. By reducing prediction variance under noise, the approach increases certifiable radii by 5–21% on CIFAR10 and ImageNet, achieving state-of-the-art ACRs of 0.86 and 1.11, respectively, while introducing adaptive sampling that reduces sample complexity by up to 55-fold.
Randomized Smoothing (RS) is a promising method for obtaining robustness certificates by evaluating a base model under noise. In this work, we: (i) theoretically motivate why ensembles are a particularly suitable choice as base models for RS, and (ii) empirically confirm this choice, obtaining state-of-the-art results in multiple settings. The key insight of our work is that the reduced variance of ensembles over the perturbations introduced in RS leads to significantly more consistent classifications for a given input. This, in turn, leads to substantially increased certifiable radii for samples close to the decision boundary. Additionally, we introduce key optimizations which enable an up to 55-fold decrease in sample complexity of RS for predetermined radii, thus drastically reducing its computational overhead. Experimentally, we show that ensembles of only 3 to 10 classifiers consistently improve on their strongest constituting model with respect to their average certified radius (ACR) by 5% to 21% on both CIFAR10 and ImageNet, achieving a new state-of-the-art ACR of 0.86 and 1.11, respectively. We release all code and models required to reproduce our results at https://github.com/eth-sri/smoothing-ensembles.
Motivation & Objective
- To theoretically and empirically demonstrate that ensembles reduce variance in randomized smoothing, leading to higher certified robustness.
- To address the high computational cost of RS certification by introducing an adaptive sampling scheme that reduces sample complexity.
- To develop a K-consensus aggregation mechanism that defers full ensemble evaluation to only the most uncertain samples.
- To achieve state-of-the-art certified accuracy across multiple benchmarks, including ImageNet and CIFAR10, under diverse settings.
- To provide a statistically sound, data-dependent framework for efficient and scalable certification of deep neural networks.
Proposed method
- Proposes a soft-ensemble scheme for RS that leverages the variance-reduction property of model ensembles to improve prediction consistency under input noise.
- Introduces an adaptive sampling strategy that certifies samples in stages, using progressively larger sample counts based on early prediction confidence.
- Employs a K-consensus aggregation mechanism that only evaluates the full ensemble when a subset of base models disagree, reducing computational load.
- Uses a statistical stopping rule based on beta-binomial modeling to determine when to halt sampling with high confidence.
- Applies the method to both standard and denoised smoothing, demonstrating robustness across diverse training regimes.
- Employs a two-phase certification process: first, a small initial sample set is used to estimate class probabilities; second, additional samples are drawn only if confidence is insufficient.
Experimental results
Research questions
- RQ1Can model ensembles significantly reduce variance in randomized smoothing, leading to higher certifiable robustness?
- RQ2How can adaptive sampling reduce the sample complexity of RS certification without compromising confidence?
- RQ3What is the impact of ensemble size and training method on certified radius and accuracy?
- RQ4How does K-consensus aggregation improve efficiency while maintaining high-certified accuracy?
- RQ5Can the proposed method achieve state-of-the-art certified accuracy on ImageNet and CIFAR10 under both standard and denoised smoothing settings?
Key findings
- Ensembles of 3 to 10 ResNet110 models improved the average certified radius (ACR) by 5% to 21% over the strongest individual model on CIFAR10, achieving an SOTA ACR of 0.86.
- On ImageNet, the method achieved a new SOTA ACR of 1.11 using ensembles of 3 to 10 models, significantly outperforming individual models.
- The adaptive sampling scheme reduced the mean certification time by up to 55-fold compared to uniform sampling, with minimal loss in accuracy.
- K-consensus aggregation reduced full ensemble evaluation to only 1.00% of samples on ResNet20 and 0.00% on ResNet110, drastically cutting computation.
- The consistency-based sampling strategy achieved higher certified accuracy than Gaussian sampling, especially at larger radii, due to better early stopping decisions.
- The method maintained high performance across different training methods and perturbation levels, demonstrating robustness and generalization.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.