QUICK REVIEW

[论文解读] Overlearning Reveals Sensitive Attributes

Congzheng Song, Vitaly Shmatikov|arXiv (Cornell University)|May 28, 2019

Adversarial Robustness in Machine Learning参考文献 29被引用 55

一句话总结

本论文表明，为简单目标而训练的模型可以隐式学习敏感属性（如种族或身份），从而导致隐私泄露和模型再用途化，并且对审查往往无法阻止这种过度学习。

ABSTRACT

"Overlearning" means that a model trained for a seemingly simple objective implicitly learns to recognize attributes and concepts that are (1) not part of the learning objective, and (2) sensitive from a privacy or bias perspective. For example, a binary gender classifier of facial images also learns to recognize races extemdash even races that are not represented in the training data extemdash and identities. We demonstrate overlearning in several vision and NLP models and analyze its harmful consequences. First, inference-time representations of an overlearned model reveal sensitive attributes of the input, breaking privacy protections such as model partitioning. Second, an overlearned model can be "re-purposed" for a different, privacy-violating task even in the absence of the original training data. We show that overlearning is intrinsic for some tasks and cannot be prevented by censoring unwanted attributes. Finally, we investigate where, when, and why overlearning happens during model training.

研究动机与目标

证明有监督的深度模型学习训练目标未指定的敏感属性。
通过推断时的表示来量化隐私泄露。
展示过度学习的表示能够使模型被重新用于隐私侵犯任务。
研究审查和去审查（de-censoring）技术的有效性。
探究在训练过程中过度学习发生的地点与原因。

提出的方法

在第 l 层，表示 z = E(x) 的有监督模型 M = C ◦ E。
通过在观测到的表示上训练攻击模型来预测敏感属性，以评估推理时的泄漏。
应用审查方法（对抗训练和信息理论方法）以抑制 z 中的敏感属性。
提出一种去审查技术，从被审查的表示中提取信息。
通过在小的 Dtransfer 上对转移的特征提取器进行微调，展示模型可重新用途化以预测敏感属性。
使用迁移学习和内部层审查来研究审查的鲁棒性。
分析逐层表示相似性（CKA），以理解过度学习产生的来源。

实验结果

研究问题

RQ1训练好的模型是否能够通过推理时的内部表示公开敏感属性？
RQ2被审查的表示是否有效防止敏感属性泄漏？
RQ3过度学习的表示是否可以重新用于在几乎没有训练数据的情况下预测敏感属性？
RQ4在网络的哪一层（哪些层）发生过度学习，为何在训练过程中出现？

主要发现

Dataset	RAND_y	BASE_y	ADV_y	IT_y	RAND_s	BASE_s	ADV_s	IT_s
Health	66.31	84.33	80.16	82.63	16.00	32.52	32.00	26.60
UTKFace	52.27	90.38	90.15	88.15	42.52	62.18	53.28	53.30
FaceScrub	53.53	98.77	97.90	97.66	1.42	33.65	30.23	10.61
Places365	56.16	91.41	90.84	89.82	1.37	31.03	12.56	2.29
Twitter	45.17	76.22	57.97	n/a	6.93	38.46	34.27	n/a
Yelp	42.56	57.81	56.79	n/a	15.88	33.09	27.32	n/a
PIPA	7.67	77.24	52.02	29.64	68.50	87.95	69.96	82.02

过度学习模型的推理时表示在多个数据集上远高于随机猜测地泄露敏感属性。
审查（对抗或信息理论）降低但未消除泄漏，且可能损害主任务性能；仍有信息可泄露。
过度学习的表示使模型能够重新用于预测敏感属性，且常常在小型迁移数据集上超越从零开始训练的模型。
下层审查可以阻止重新用途化，但对手仍可能利用其他层；对内层的审查对于鲁棒保护是必要的。
对某些任务，过度学习似乎是固有的；即使训练数据中不存在的属性也可以被恢复，这挑战了简单的基于审查的隐私保护。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。