QUICK REVIEW

[论文解读] Censoring Representations with an Adversary

Harrison Edwards, Amos Storkey|arXiv (Cornell University)|Nov 18, 2015

Adversarial Robustness in Machine Learning参考文献 16被引用 104

一句话总结

本文提出一种对抗性框架——对抗性学习公平表征（Adversarial Learned Fair Representations, ALFR），以学习在主任务上具有判别性且对敏感变量信息最少的表征。通过在极小化极大博弈中训练表征网络以欺骗对手，该方法在基准数据集上实现了最先进的公平性表现，并实现了新颖的无监督图像匿名化，无需成对的输入输出数据。

ABSTRACT

In practice, there are often explicit constraints on what representations or decisions are acceptable in an application of machine learning. For example it may be a legal requirement that a decision must not favour a particular group. Alternatively it can be that that representation of data must not have identifying information. We address these two related issues by learning flexible representations that minimize the capability of an adversarial critic. This adversary is trying to predict the relevant sensitive variable from the representation, and so minimizing the performance of the adversary ensures there is little or no information in the representation about the sensitive variable. We demonstrate this adversarial approach on two problems: making decisions free from discrimination and removing private information from images. We formulate the adversarial model as a minimax problem, and optimize that minimax objective using a stochastic gradient alternate min-max optimizer. We demonstrate the ability to provide discriminant free representations for standard test problems, and compare with previous state of the art methods for fairness, showing statistically significant improvement across most cases. The flexibility of this method is shown via a novel problem: removing annotations from images, from unaligned training examples of annotated and unannotated images, and with no a priori knowledge of the form of annotation provided to the model.

研究动机与目标

为解决机器学习中的公平性问题，确保预测结果与性别、种族等敏感属性无关。
通过从数据中去除敏感或私密信息（如图像中的文字）实现隐私保护的表征学习。
开发一种灵活的端到端方法，无需成对训练数据（如带标注和无标注的图像）。
通过使用联合优化公平性与实用性的对抗极小化目标，超越现有公平性方法。
通过将同一框架应用于公平性与图像匿名化任务，展示方法的多功能性，且共享相同架构与训练流程。

提出的方法

将问题形式化为极小化极大优化：训练表征网络以最小化对手从表征中预测敏感变量的能力。
使用深度神经网络学习表征，并使用另一个独立的深度神经网络作为对手来预测敏感变量。
使用随机梯度交替极小化优化器，交替更新表征网络和对手网络。
通过训练自编码器在重建图像时去除私密信息（如文字）来将同一框架应用于图像匿名化，同时对手尝试检测此类信息的存在。
采用基于图像块的专家模型进行图像重建，使用图像块分类器决定是否使用自编码器或直接复制图像块。
使用超参数 α=1 和 β=10 来平衡重建损失与对抗损失，该参数在验证数据上进行调优。

实验结果

研究问题

RQ1对抗性训练框架是否能有效学习到既公平又具有判别性的表征，而无需对敏感变量施加显式约束？
RQ2ALFR 方法在标准公平性基准测试中与先前最先进公平性方法相比表现如何？
RQ3该对抗性框架是否可应用于图像匿名化，且无需成对的输入输出数据（如带标注与无标注图像）？
RQ4当仅有无配对数据时，模型在去除图像中私密信息（如文字）方面具有多大程度的泛化能力？
RQ5在缺乏配对监督用于图像重建的情况下，对抗性训练过程的稳定性和有效性如何？

主要发现

ALFR 在 Diabetes 和 Adult 数据集上相较于先前最先进公平性方法实现了统计上显著的性能提升。
模型成功学习到与敏感变量几乎无关的表征，表现为对手预测准确率显著降低。
在图像匿名化任务中，即使仅使用无配对数据（带文字和无文字图像）进行训练，模型仍能生成合理且无文字的图像重建结果。
视觉结果表明，训练后对手无法再可靠区分带标注与无标注图像，表明敏感线索已被有效去除。
该方法通过使用相同的核心架构与训练流程，同时处理公平性与图像匿名化任务，展现出高度灵活性。
尽管在近距离观察时存在一些可见伪影，但重建结果在视觉上仍具合理性，表明该方法在现实世界隐私保护应用中具有潜力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。