QUICK REVIEW

[论文解读] Differentially Private Contextual Linear Bandits

Roshan Shariff, Or Sheffet|arXiv (Cornell University)|Jan 1, 2018

Privacy-Preserving Technologies in Data被引用 30

一句话总结

本文提出了一种用于上下文线性Bandit的联合差分隐私框架，其中隐私保护仅向前推进，不涵盖当天行动的隐私。通过将线性-UCB算法与基于树的机制结合，并采用高斯噪声或Wishart噪声，作者在隐私约束下实现了有界遗憾，首次建立了多臂Bandit问题中因隐私导致的额外遗憾的下限。

ABSTRACT

We study the contextual linear bandit problem, a version of the standard stochastic multi-armed bandit (MAB) problem where a learner sequentially selects actions to maximize a reward which depends also on a user provided per-round context. Though the context is chosen arbitrarily or adversarially, the reward is assumed to be a stochastic function of a feature vector that encodes the context and selected action. Our goal is to devise private learners for the contextual linear bandit problem. We first show that using the standard definition of differential privacy results in linear regret. So instead, we adopt the notion of joint differential privacy, where we assume that the action chosen on day t is only revealed to user t and thus needn't be kept private that day, only on following days. We give a general scheme converting the classic linear-UCB algorithm into a joint differentially private algorithm using the tree-based algorithm. We then apply either Gaussian noise or Wishart noise to achieve joint-differentially private algorithms and bound the resulting algorithms' regrets. In addition, we give the first lower bound on the additional regret any private algorithms for the MAB problem must incur.

研究动机与目标

解决在上下文为对抗性且奖励为随机性的情境下，保持用户隐私的上下文线性Bandit设置中的挑战。
表明标准差分隐私在上下文线性Bandit中会导致线性遗憾，因此在实际应用中无效。
提出一种新的隐私概念——联合差分隐私，其中行动仅对未来的披露负责，而非当天的暴露。
设计一种通用转换方案，将标准线性-UCB转化为使用基于树的机制的联合差分私有算法。
首次建立多臂Bandit设置中任何私有算法因隐私导致的额外遗憾的理论下限。

提出的方法

采用联合差分隐私，其中第t天选择的行动仅需对t+1天及以后保持私密，而非对第t天。
使用基于树的机制生成私有梯度或估计，确保时间步之间的隐私性。
在基于树的机制中集成高斯或Wishart噪声，以实现具有可控隐私损失的联合差分隐私。
通过使用带噪声的树机制对奖励模型参数进行私有估计，修改线性-UCB算法。
通过分析隐私预算与估计误差之间的权衡，界定所得到的私有算法的遗憾。
推导出任何私有算法必须承担的额外遗憾的下限，证明多臂Bandit中隐私-效用权衡的根本限制。

实验结果

研究问题

RQ1为何标准差分隐私在上下文线性Bandit中会导致线性遗憾？
RQ2一种放松当天隐私要求的隐私概念是否仍能确保强隐私保障，同时实现次线性遗憾？
RQ3如何将线性-UCB算法调整为使用基于树的机制以保持联合差分隐私？
RQ4使用高斯噪声与Wishart噪声对私有上下文Bandit的遗憾性能有何影响？
RQ5在多臂Bandit设置中，任何私有算法必须承担的额外遗憾的根本下限是什么？

主要发现

标准差分隐私在上下文线性Bandit中导致线性遗憾，使其在学习中无效。
联合差分隐私通过放松对当天行动的隐私要求，实现了次线性遗憾。
所提出的基于树的机制结合高斯或Wishart噪声，成功实现了具有有界遗憾的联合差分隐私。
私有算法的遗憾随隐私预算和特征空间维度而增长，且提供了明确的界限。
本文首次建立了因隐私导致的额外遗憾的下限，表明在私有Bandit学习中，部分遗憾增加是不可避免的。
理论分析确认，所提出的方法在隐私与遗憾性能之间实现了有利的权衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。