QUICK REVIEW

[论文解读] Data-Free Adversarial Distillation

Gongfan Fang, Jie Song|arXiv (Cornell University)|Dec 23, 2019

Adversarial Robustness in Machine Learning参考文献 39被引用 103

一句话总结

本文提出 Data-Free Adversarial Distillation (DFAD)，是一种两阶段对抗框架，通过联合使用生成器和师生判别器来最小化可优化的模型差异上界，从而为学生模型构造一个无数据的训练信号。它扩展到语义分割，并在数据驱动方法上实现具有竞争力的结果。

ABSTRACT

Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large amount of original training data or alternative data, which is usually unavailable in real-world scenarios. In this paper, we devote ourselves to this challenging problem and propose a novel adversarial distillation mechanism to craft a compact student model without any real-world data. We introduce a model discrepancy to quantificationally measure the difference between student and teacher models and construct an optimizable upper bound. In our work, the student and the teacher jointly act the role of the discriminator to reduce this discrepancy, when a generator adversarially produces some "hard samples" to enlarge it. Extensive experiments demonstrate that the proposed data-free method yields comparable performance to existing data-driven methods. More strikingly, our approach can be directly extended to semantic segmentation, which is more complicated than classification, and our approach achieves state-of-the-art results. Code and pretrained models are available at https://github.com/VainF/Data-Free-Adversarial-Distillation.

研究动机与目标

在现实世界场景中原始训练数据不可用时，激发知识蒸馏的动机。
提出一个数据无关框架，在没有真实数据的情况下近似并最小化师生模型差异。
开发一个对抗性训练机制，持续构造困难样本以提升学生模型。
将数据无关蒸馏扩展到语义分割，并展示有竞争力的性能。

提出的方法

将教师 T 与学生 S 之间的模型差异定义为 D(T,S)，并用产生训练样本的生成器 G 来近似它。
采用两阶段对抗过程：模仿阶段（在来自 G 的样本上使用 MAE 损失最小化差异）和生成阶段（通过对 G 进行优化以稳定的 log-MAE 目标来最大化差异）。
采用 MAE 作为差异损失，以确保梯度稳定并在缺乏真实数据时防止生成器崩溃。
将生成的样本分为困难型和易于类型，以界定差异并引导生成器产生具有挑战性、信息量大的样本。
迭代地更新 S 以在生成样本上模仿 T，同时更新 G 以产生更困难的样本，目标使 S 在功能上与 T 区别趋于不可区分。
提供稳定性指导（例如固定 k=5 的模仿步数，分割任务使用 L_GEN-ADA）以确保稳定训练。

实验结果

研究问题

RQ1一个数据无关蒸馏框架是否能够在分类和分割任务上达到数据驱动 KD 方法的性能？
RQ2在没有真实数据的情况下，我们如何量化并最小化教师和学生模型之间的差异？
RQ3对抗性生成样本是否可以在缺乏原始数据的情况下有效训练出有竞争力的学生模型？
RQ4所提出的框架是否可扩展到分割任务并获得竞争性结果？

主要发现

提出的 DFAD 框架在分类数据集上相比数据驱动的蒸馏基线取得竞争性表现。
在数据无关方法中，该方法在若干分类基准上实现了最高准确率（分类结果优于其他数据无关方法）。
该方法自然扩展到语义分割，并在 CamVid 和 NYUv2 上实现有竞争力的 mIoU 分数，超过其他数据无关方法。
生成的样本保持多样性，解决模式崩溃问题，并在训练中提供信息丰富的监督。
基于 MAE 的差异损失提供稳定梯度，并优于其他损失选项（如 MSE、KLD）用于生成器。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。