QUICK REVIEW

[论文解读] Auditing Black-Box Models Using Transparent Model Distillation With Side Information

Sarah Tan, Rich Caruana|arXiv (Cornell University)|Oct 17, 2017

Explainable Artificial Intelligence (XAI)参考文献 34被引用 15

一句话总结

本文提出 Distill-and-Compare 方法，通过训练可解释的学生模型来模仿黑箱风险评分模型的预测结果（知识蒸馏），然后将这些学生模型与基于真实结果训练的可解释模型进行比较，以审计黑箱模型。该方法可揭示潜在偏差或缺失特征的问题，并发现原始模型中可能缺少关键特征的 ProPublica COMPAS 数据集。

ABSTRACT

Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, a model distillation and comparison approach to audit such models. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by black-box models. We compare the student model trained with distillation to a second un-distilled transparent model trained on ground-truth outcomes, and use differences between the two models to gain insight into the black-box model. Our approach can be applied in a realistic setting, without probing the black-box model API. We demonstrate the approach on four public data sets: COMPAS, Stop-and-Frisk, Chicago Police, and Lending Club. We also propose a statistical test to determine if a data set is missing key features used to train the black-box model. Our test finds that the ProPublica data is likely missing key feature(s) used in COMPAS.

研究动机与目标

开发一种无需直接访问其内部逻辑或 API 的方法，用于审计不透明且专有的风险评分模型。
识别用于训练或评估黑箱模型的公开数据集中是否缺少关键特征。
通过将蒸馏的学生模型与基于真实结果训练的可解释模型进行比较，提供对黑箱模型行为的可解释洞察。
在无法探测黑箱模型的现实环境中实现审计。

提出的方法

使用知识蒸馏技术，训练一个可解释的学生模型来模仿黑箱模型产生的风险评分。
训练第二个可解释模型，直接基于真实结果进行训练，不使用黑箱模型的预测结果。
使用统计方法和模型可解释性技术，比较两个可解释模型（蒸馏模型与真实结果模型）之间的差异，以检测不一致之处。
利用蒸馏模型与真实结果模型之间的差异，推断黑箱模型中可能存在的偏差或特征遗漏。
应用统计检验，检测用于审计的数据集是否缺少原始黑箱模型训练中使用的关键特征。
将该方法应用于四个真实世界数据集：COMPAS、Stop-and-Frisk、芝加哥警察局和 Lending Club，以验证其有效性。

实验结果

研究问题

RQ1我们能否在不访问其内部结构或 API 的情况下审计黑箱风险模型？
RQ2蒸馏学生模型与真实结果可解释模型之间的差异，如何揭示黑箱模型中的偏差或缺陷？
RQ3用于训练或评估黑箱模型的数据集是否缺少关键的预测特征？
RQ4基于蒸馏的审计方法在多大程度上能够检测现实世界风险评分系统中的特征遗漏？
RQ5统计检验能否检测出数据集缺少原始黑箱模型训练中使用的关键特征？

主要发现

ProPublica COMPAS 数据集很可能缺少原始模型训练中使用的关键特征，这由蒸馏模型与真实结果模型之间显著的统计差异所表明。
Distill-and-Compare 方法成功识别出模型行为差异，表明黑箱模型中可能存在特征遗漏或偏差。
该方法无需探测黑箱模型的 API，因此适用于现实世界中受限访问的场景。
该方法揭示了当与真实结果模型比较时，基于蒸馏的学生模型可暴露黑箱预测中的结构性不一致。
用于检测缺失特征的统计检验成功将 COMPAS 数据集标记为不完整，表明其公开可用性可能无法反映完整的训练数据。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。