QUICK REVIEW

[论文解读] Bounding the Estimation Error of Sampling-based Shapley Value Approximation

Sasan Maleki, Tran-Thanh, Long|arXiv (Cornell University)|Jun 18, 2013

Game Theory and Voting Systems参考文献 19被引用 64

一句话总结

本文提出了基于抽样的Shapley值近似估计误差的非渐近界，当已知边际贡献的方差或范围时，使用切比雪夫不等式和霍夫丁不等式进行推导。当范围相对于Shapley值较大时，进一步改进了误差界，并引入分层抽样方法显著降低了误差，在有利条件下实现了$O(\sqrt{r/m})$的缩放性能。

ABSTRACT

The Shapley value is arguably the most central normative solution concept in cooperative game theory. It specifies a unique way in which the reward from cooperation can be "fairly" divided among players. While it has a wide range of real world applications, its use is in many cases hampered by the hardness of its computation. A number of researchers have tackled this problem by (i) focusing on classes of games where the Shapley value can be computed efficiently, or (ii) proposing representation formalisms that facilitate such efficient computation, or (iii) approximating the Shapley value in certain classes of games. For the classical extit{characteristic function} representation, the only attempt to approximate the Shapley value for the general class of games is due to Castro extit{et al.} \cite{castro}. While this algorithm provides a bound on the approximation error, this bound is extit{asymptotic}, meaning that it only holds when the number of samples increases to infinity. On the other hand, when a finite number of samples is drawn, an unquantifiable error is introduced, meaning that the bound no longer holds. With this in mind, we provide non-asymptotic bounds on the estimation error for two cases: where (i) the extit{variance}, and (ii) the extit{range}, of the players' marginal contributions is known. Furthermore, for the second case, we show that when the range is significantly large relative to the Shapley value, the bound can be improved (from $O(\frac{r}{m})$ to $O(\sqrt{\frac{r}{m}})$). Finally, we propose, and demonstrate the effectiveness of using stratified sampling for improving the bounds further.

研究动机与目标

解决现有基于抽样的Shapley值近似算法中缺乏有限样本误差界的问题。
在已知边际贡献的方差或范围时，为Shapley值估计提供非渐近误差界。
在边际贡献的范围显著大于Shapley值的情况下，改进误差界。
提出并评估分层抽样作为进一步收紧估计误差界的方法。
通过理论分析与简单随机抽样对比，证明分层抽样的有效性。

提出的方法

当已知边际贡献的方差时，使用切比雪夫不等式来界定估计误差。
当已知边际贡献的范围（最大值减最小值）时，应用霍夫丁不等式来界定误差。
推导出当范围$r$显著大于Shapley值时，误差界改进为$O(\sqrt{r/m})$。
通过基于联盟规模将联盟划分为不同层，引入分层抽样，并在各层间最优分配样本。
建立一个优化问题，通过按$m_k^* \propto (k+1)^{2/3}$分配样本以最小化总估计误差。
实现一种实用算法（算法2），通过向下取整和剩余分配方式将样本分配至各层，确保$m_k \geq m_k^*/2$。

实验结果

研究问题

RQ1当仅已知边际贡献的方差或范围时，能否为基于抽样的Shapley值近似建立非渐近误差界？
RQ2当边际贡献的范围显著大于Shapley值时，估计误差如何缩放？
RQ3与简单随机抽样相比，分层抽样能否降低Shapley值的估计误差？
RQ4在分层抽样中，为最小化总估计误差，各层间的最优样本分配策略是什么？
RQ5在样本效率方面，分层抽样的理论误差界与简单随机抽样的理论误差界相比如何？

主要发现

本文利用切比雪夫不等式和霍夫丁不等式建立了非渐近误差界，适用于任意有限样本数，与以往的渐近界不同。
当边际贡献的范围$r$显著大于Shapley值时，误差界从$O(r/m)$改进为$O(\sqrt{r/m})$。
在有利条件下，分层抽样将总估计误差降低至$O(\sqrt{r/m})$，理论界为$|\hat{\phi} - \phi| \leq \frac{d\sqrt{-\ln{\delta/2}}}{\sqrt{m}} \cdot \frac{n+1}{2}$。
当$m > \frac{(n+1)^2}{4}$时，所提出的分层抽样算法的误差界优于简单随机抽样，因为后者的误差至少为$d\sqrt{n(-\ln{\delta/2})}$。
各层间的最优样本分配与$(k+1)^{2/3}$成正比，且算法确保$m_k \geq m_k^*/2$，从而保持理论保证。
理论分析证实，分层抽样显著提升了样本效率，尤其在拥有大量玩家的大规模博弈中表现更优。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。