QUICK REVIEW

[论文解读] Robust Principles: Architectural Design Principles for Adversarially Robust CNNs

Shengyun Peng, Weilin Xu|arXiv (Cornell University)|Aug 30, 2023

Adversarial Robustness in Machine Learning被引用 13

一句话总结

本文识别并验证了一组可推广的架构设计原则，这些原则在多个数据集（CIFAR-10/100 和 ImageNet）和对抗性训练方法下提升了卷积神经网络（CNNs）的对抗鲁棒性，在 CIFAR 上获得 1–3 个百分点的提升，在 ImageNet 上获得 4–9 个百分点的提升。

ABSTRACT

Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs. To accomplish our goal, we synthesize a suite of three generalizable robust architectural design principles: (a) optimal range for depth and width configurations, (b) preferring convolutional over patchify stem stage, and (c) robust residual block design through adopting squeeze and excitation blocks and non-parametric smooth activation functions. Through extensive experiments across a wide spectrum of dataset scales, adversarial training methods, model parameters, and network design spaces, our principles consistently and markedly improve AutoAttack accuracy: 1-3 percentage points (pp) on CIFAR-10 and CIFAR-100, and 4-9 pp on ImageNet. The code is publicly available at https://github.com/poloclub/robust-principles.

研究动机与目标

综合可推广的架构原理，影响 CNN 的对抗鲁棒性。
解决文献中关于深度/宽度尺度、stem 设计、SE 块和激活函数选择的冲突性发现。
在多种数据集、模型规模和对抗性训练方法下展示鲁棒性提升。

提出的方法

在 CNN 和 Transformers 中调查四个架构组件（深度/宽度、stem 阶段、 squeeze-and-excitation 块、激活函数）。
提出并测试一个灵活的深度-宽度缩放规则，使用 width-depth (WD) 比率，并确定一个最优 WD 区间。
比较卷积干道与 patchify 干道（包括推迟下采样），以评估鲁棒性影响。
通过超参数扫描和大规模实验（ImageNet），研究 SE 块和非参数平滑激活函数（SiLU/GELU）相对于 ReLU 的鲁棒性影响。
使用多种 AT 方案（SAT, TRADES, Fast-AT, MART, diffusion-augmented AT）和攻击（PGD 和 AutoAttack）评估鲁棒性。
将这三条设计原则整合为一个鲁棒 CNN（Ra），并在 CIFAR 与 ImageNet 的 ResNet/WRN 设计空间上测试泛化性。

Figure 1 : We synthesize a suite of generalizable architectural design principles to robustify CNNs , spanning a network’s macro and micro designs: (A) optimal range for depth and width configurations, (B) preferring convolutional over patchify stem stage, and (C) robust residual block design by ado

实验结果

研究问题

RQ1深度/宽度配置、stem 设计、SE 块和激活函数的选择是否在大规模数据集上对对抗鲁棒性有一致的影响？
RQ2一个统一的架构原则集合是否能在超越 CIFAR 的多样化 AT 方法和模型族上提升鲁棒性？
RQ3将所提原则组合在一个鲁棒架构中时，它们如何相互作用？
RQ4这些原则是否能从 CNN 泛化到不同的网络设计空间（ResNet/WRN）以及到 Transformers？

主要发现

一个最优的 WD 比例范围 [7.5, 13.5] 在跨数据集和跨 AT 方法中带来鲁棒性提升，超出该范围时 WD 与干净准确率和 PGD 准确率呈负相关。
卷积干道配合推迟下采样在鲁棒性上优于 patchify stem，因为下降采样不那么激进且卷积重叠。
当 r=4 时，SE 块在 ImageNet 上提高鲁棒性；在较大 r 值（如 CIFAR 上 r≥32）时鲁棒性呈负向。
非参数平滑激活函数（SiLU/GELU）在 CIFAR 与 ImageNet 上始终优于 ReLU，且在大多数设置中优于有参数的激活函数。
把三条原理累计应用于 ResNet-50 和 WRN 架构，获得一致的鲁棒性提升，例如 Ra ResNet-50 相较基线在 AA 上的增益为 +1.80 (PGD2)，+3.09 (PGD4)，+3.28 (PGD8) 和 +2.65 (AutoAttack)，Ra WRN-101-2 在 ImageNet 上也取得了显著增益。
Ra 模型在 ImageNet 的 AutoAttack 准确率上，在参数预算和设计空间上实现 4–9 百分点的提升（例如 Ra WRN-101-2 与 WRN-101-2 相比）。
在 CIFAR-10/100 上，Ra 在各种 AT 方法和扩散增强 AT 下实现 1–3 个百分点的提升，包括相对于 Diff. 1M 的显著提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。