Skip to main content
QUICK REVIEW

[论文解读] RobustBench: a standardized adversarial robustness benchmark

Francesco Croce, Maksym Andriushchenko|arXiv (Cornell University)|Oct 19, 2020
Adversarial Robustness in Machine Learning参考文献 135被引用 116
一句话总结

tldr: RobustBench 建立标准化、可重复的对抗鲁棒性评估,使用 AutoAttack、一个排行榜,以及一个 Model Zoo,用于比较防御并分析在分布及相关因素上的鲁棒性。

ABSTRACT

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models. A key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation. Our goal is to establish a standardized benchmark of adversarial robustness, which as accurately as possible reflects the robustness of the considered models within a reasonable computational budget. To this end, we start by considering the image classification task and introduce restrictions (possibly loosened in the future) on the allowed models. We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks, which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications. To prevent overadaptation of new defenses to AutoAttack, we welcome external evaluations based on adaptive attacks, especially where AutoAttack flags a potential overestimation of robustness. Our leaderboard, hosted at https://robustbench.github.io/, contains evaluations of 120+ models and aims at reflecting the current state of the art in image classification on a set of well-defined tasks in $\ell_\infty$- and $\ell_2$-threat models and on common corruptions, with possible extensions in the future. Additionally, we open-source the library https://github.com/RobustBench/robustbench that provides unified access to 80+ robust models to facilitate their downstream applications. Finally, based on the collected models, we analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.

研究动机与目标

  • Define a standardized, reliable evaluation protocol for adversarial robustness under common threat models.
  • Provide an up-to-date, public leaderboard to track progress in robust image classification.
  • Open-source a Model Zoo of robust models to facilitate downstream use and fair comparisons.
  • Assess how robustness interacts with distribution shifts, calibration, OOD detection, fairness, and privacy leakage.

提出的方法

  • Use AutoAttack as the current standard evaluation for l_infinity and l_2 threat models on CIFAR-10, CIFAR-100, and ImageNet.
  • Impose restrictions on submitted models to ensure reliable evaluation (non-zero input gradients, deterministic forward pass, no forward-time optimization loops).
  • Provide external adaptive evaluations to flag potential robustness overestimation and encourage further testing.
  • Maintain a public leaderboard (robustbench.github.io) with 120+ model evaluations and a Model Zoo with 80+ robust models.
  • Open-source a unified library to benchmark models and enable easy downstream usage of robust models.

实验结果

研究问题

  • RQ1What constitutes a reliable, standardized evaluation of adversarial robustness across common threat models?
  • RQ2How does robustness under l_infinity and l_2 perturbations relate to calibration, distribution shifts, and other properties like OOD detection and privacy leakage?
  • RQ3Can a publicly maintained leaderboard and model zoo accelerate progress and fair comparisons in adversarial robustness research?

主要发现

  • Many previously reported robust accuracies are overestimated when evaluated with suboptimal attacks; standardized AutoAttack provides tighter upper bounds.
  • Robust models tend to be underconfident and require calibration (temperature scaling improves ECE substantially, but gaps remain).
  • Robust training can degrade OOD detection quality and fairness across classes, though effects vary by method and threat model.
  • Extra training data helps mitigate robustness-accuracy trade-offs, but robustness remains associated with some performance degradation on clean accuracy.
  • Adversarial examples transfer preferentially among robust-robust models and robust-to-robust, but less so from robust to non-robust models, with model smoothness correlating with robustness.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。