QUICK REVIEW

[论文解读] Fighting Fire with Fire: Using Antidote Data to Improve Polarization and Fairness of Recommender Systems

Bashir Rastegarpanah, Krishna P. Gummadi|arXiv (Cornell University)|Dec 2, 2018

Recommender Systems and Techniques参考文献 34被引用 26

一句话总结

本文提出了一种新颖的数据增强方法——'解毒数据'（antidote data），在不修改算法或现有数据的前提下，减少基于矩阵分解的推荐系统中的极化现象并提升公平性。通过战略性地添加能够抵消偏见或极化推荐的合成用户评分，该方法在仅增加1%数据的情况下，使公平性和极化指标最高提升50%，同时保持高系统准确性。

ABSTRACT

The increasing role of recommender systems in many aspects of society makes it essential to consider how such systems may impact social good. Various modifications to recommendation algorithms have been proposed to improve their performance for specific socially relevant measures. However, previous proposals are often not easily adapted to different measures, and they generally require the ability to modify either existing system inputs, the system's algorithm, or the system's outputs. As an alternative, in this paper we introduce the idea of improving the social desirability of recommender system outputs by adding more data to the input, an approach we view as providing `antidote' data to the system. We formalize the antidote data problem, and develop optimization-based solutions. We take as our model system the matrix factorization approach to recommendation, and we propose a set of measures to capture the polarization or fairness of recommendations. We then show how to generate antidote data for each measure, pointing out a number of computational efficiencies, and discuss the impact on overall system accuracy. Our experiments show that a modest budget for antidote data can lead to significant improvements in the polarization or fairness of recommendations.

研究动机与目标

为应对推荐系统带来的社会危害日益增长的担忧，包括极化和对用户的不公平对待。
开发一种在不修改现有系统算法或输入的前提下，提升公平性并减少极化的方法。
形式化一个生成合成'解毒数据'的框架，以抵消偏见推荐模式。
评估社会可取性改进与整体系统准确率之间的权衡。
证明少量精心选择的合成数据可显著提升公平性并减少极化。

提出的方法

该方法引入'解毒数据'——即添加到输入训练数据中的合成用户评分，以抵消基于矩阵分解的推荐系统中的极化和不公平现象。
将公平性和极化形式化为可微分的目标函数，并推导梯度以指导最优解毒评分的生成。
针对个体公平性，使用用户特定的损失差异；针对群体公平性，使用群体层面的损失差异；针对极化，使用用户间评分方差。
采用优化技术计算目标函数相对于解毒评分的梯度，从而实现有针对性的数据注入。
提出了两种启发式算法：一种基于梯度符号，另一种使用全局梯度方向，将解毒评分设置为最小值或最大值。
该方法计算高效且可扩展，利用矩阵运算，避免从头开始重新训练。

实验结果

研究问题

RQ1在不修改算法的前提下，向训练好的推荐系统输入中添加合成数据，能否减少极化并提升公平性？
RQ2如何生成解毒数据，以最优方式减少矩阵分解模型中的极化和不公平性？
RQ3在提升社会指标与维持整体推荐准确率之间存在怎样的权衡？
RQ4需要多少合成数据才能在公平性和极化方面实现显著改进？
RQ5该框架能否同时应用于多种公平性和极化指标？

主要发现

仅增加1%的合成用户作为解毒数据，便在测试数据集上平均使极化指标降低50%。
该方法在个体公平性和群体公平性方面均实现显著改进，对敏感用户群体的不公平对待现象有明显减少。
解毒数据方法保持了高系统准确率，即使在注入大量合成评分的情况下，准确率也仅出现轻微下降。
基于梯度的启发式算法有效识别出最优解毒评分，优于基线的随机或均匀数据注入策略。
通过利用闭式解和矩阵运算，实现了计算效率，无需完整重训即可快速生成解毒数据。
该框架具有通用性，可应用于任何可微分的公平性或极化度量，因而可灵活适配多样化的社会影响目标。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。