QUICK REVIEW

[论文解读] Revisiting Sparse Convolutional Model for Visual Recognition

Xili Dai, Mingyang Li|arXiv (Cornell University)|Oct 24, 2022

Adversarial Robustness in Machine Learning被引用 21

一句话总结

本论文将卷积稀疏编码（CSC）层嵌入为标准卷积的即插即用替代，用以形成稀疏字典学习网络（SDNets），在保持竞争性准确度的同时增强可解释性以及对腐扰与对抗扰动的鲁棒性。

ABSTRACT

Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be expressed by a linear combination of a few elements from a convolutional dictionary, are powerful tools for analyzing natural images with good theoretical interpretability and biological plausibility. However, such principled models have not demonstrated competitive performance when compared with empirically designed deep networks. This paper revisits the sparse convolutional modeling for image classification and bridges the gap between good empirical performance (of deep learning) and good interpretability (of sparse convolutional models). Our method uses differentiable optimization layers that are defined from convolutional sparse coding as drop-in replacements of standard convolutional layers in conventional deep neural networks. We show that such models have equally strong empirical performance on CIFAR-10, CIFAR-100, and ImageNet datasets when compared to conventional neural networks. By leveraging stable recovery property of sparse modeling, we further show that such models can be much more robust to input corruptions as well as adversarial perturbations in testing through a simple proper trade-off between sparse regularization and data reconstruction terms. Source code can be found at https://github.com/Delay-Xili/SDNet.

研究动机与目标

在图像分类任务中，提高稀疏模型的可解释性与深度网络的强大经验性能之间的联系与动机。
引入 CSC-层作为可微分优化层，以替代 CNN 主干中的标准卷积。
在 CIFAR-10、CIFAR-100 和 ImageNet 上展示具有竞争力的准确率，并保持高效的训练效率。
通过基于稀疏建模的方法，展示对输入污染和对抗扰动的鲁棒性提升。

提出的方法

定义通过 FISTA 将稀疏编码目标作为可微分隐式层来解决的卷积稀疏编码（CSC）层。
用 CSC-层替代 ResNet 基础骨干网络中的选定或全部卷积层，以形成 SDNet 架构。
端到端训练，使用交叉熵损失和归一化字典约束；使用投影随机梯度下降以在归一化集合中强制 A。
通过在测试时调整稀疏性参数 lambda 来利用鲁棒推断，以处理嘈杂输入（CSC 的定理启发的稳定性）。
展开两次 FISTA 的迭代进行前向传播以实现反向传播和可行的训练。
提供鲁棒推断过程（算法 1），从合成腐蚀中学习一个 lambda 残差关系，以在测试时选择 lambda。

实验结果

研究问题

RQ1与标准卷积网络在 CIFAR-10/100 和 ImageNet 上相比，CSC-层能否提供具有竞争力的图像分类性能？
RQ2在不进行大量数据增强或训练变更的情况下，带有 CSC-层的 SDNet 是否对输入扰动和对抗扰动具有更好的鲁棒性？
RQ3基于稀疏建模的方式如何影响深度网络的可解释性和层级行为？
RQ4是否可以通过简单的在测试时调整稀疏性参数 lambda 来提高对不同噪声类型的腐蚀鲁棒性？
RQ5用 CSC-层替代卷积在计算成本与准确率之间的权衡是什么？

主要发现

数据集	架构	模型大小	Top-1 准确率	内存	速度
CIFAR-10	ResNet-18	11.2M	95.54%	1.0 GB	1600 n/s
CIFAR-10	ResNet-34	21.1M	95.57%	2.0 GB	1000 n/s
CIFAR-10	MDEQ	11.1M	93.80%	2.0 GB	90 n/s
CIFAR-10	SCN	0.7M	94.36%	10.0 GB	39 n/s
CIFAR-10	SCN-18	11.2M	95.12%	3.5 GB	158 n/s
CIFAR-10	SDNet-18 (ours)	11.2M	95.20%	1.2 GB	1500 n/s
CIFAR-10	SDNet-34 (ours)	21.1M	95.57%	2.4 GB	900 n/s
CIFAR-100	ResNet-18	11.2M	77.82%	1.0 GB	1600 n/s
CIFAR-100	ResNet-34	21.1M	78.39%	2.0 GB	1000 n/s
CIFAR-100	MDEQ	11.2M	74.12%	2.0 GB	90 n/s
CIFAR-100	SCN	0.7M	80.07%	10.0 GB	39 n/s
CIFAR-100	SCN-18	11.2M	78.59%	3.5 GB	158 n/s
CIFAR-100	SDNet-18 (ours)	11.3M	78.31%	1.2 GB	1500 n/s
CIFAR-100	SDNet-34 (ours)	21.2M	78.48%	2.4 GB	900 n/s
ImageNet	ResNet-18	11.7M	68.98%	24.1 GB	2100 n/s
ImageNet	ResNet-34	21.5M	72.83%	32.3 GB	1400 n/s
ImageNet	SCN	9.8M	70.42%	95.1 GB	51 n/s
ImageNet	SDNet-18 (ours)	11.7M	69.47%	37.6 GB	1800 n/s
ImageNet	SDNet-34 (ours)	21.5M	72.67%	46.4 GB	1200 n/s

SDNet-18/SDNet-34 在 CIFAR-10/100 和 ImageNet 上的 Top-1 准确率与在类似参数预算下的 ResNet-18/34 相对持平。
SDNet 模型对被污染的输入具有鲁棒性，在 CIFAR-10-C 和 ImageNet-C 上采用自适应 lambda 相较固定 lambda 进一步提高了准确性。
由 Algorithm 1 指导的自适应 lambda 在被污染的环境中相较固定训练时的 lambda（0.1）获得显著提升。
与 MDEQ 相比，SDNet-18 速度提升超过 7 倍且在 CIFAR-10/ImageNet 上保持更高准确率；SCN 虽然达到竞争性准确率，但训练速度较慢。
当 lambda 调整时，SDNets 对对抗鲁棒性有所提升，在 PGD 攻击下显著提高鲁棒准确性。
在 CSC-层中增加 FISTA 迭代次数，自然准确率和对 ImageNet 及 ImageNet-C 的鲁棒性都得到稳步提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。