QUICK REVIEW

[论文解读] Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

David R. Karger, Sewoong Oh|arXiv (Cornell University)|Oct 17, 2011

Mobile Crowdsensing and Crowdsourcing参考文献 26被引用 24

一句话总结

本文提出了一种非自适应、预算最优的任务分配算法，通过利用信念传播和低秩矩阵逼近，从工人响应中推断出正确答案，以实现可靠的众包。该方法在工人短暂且不可靠的情况下，仍能实现近似最优性能，且成本仅增加常数倍，表现出阶最优性。此外，研究表明自适应任务分配在成本缩放上并无渐近优势。

ABSTRACT

Crowdsourcing systems, in which numerous tasks are electronically distributed to numerous "information piece-workers", have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all such systems must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in an appropriate manner, e.g. majority voting. In this paper, we consider a general model of such crowdsourcing tasks and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give a new algorithm for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm, inspired by belief propagation and low-rank matrix approximation, significantly outperforms majority voting and, in fact, is optimal through comparison to an oracle that knows the reliability of every worker. Further, we compare our approach with a more general class of algorithms which can dynamically assign tasks. By adaptively deciding which questions to ask to the next arriving worker, one might hope to reduce uncertainty more efficiently. We show that, perhaps surprisingly, the minimum price necessary to achieve a target reliability scales in the same manner under both adaptive and non-adaptive scenarios. Hence, our non-adaptive approach is order-optimal under both scenarios. This strongly relies on the fact that workers are fleeting and can not be exploited. Therefore, architecturally, our results suggest that building a reliable worker-reputation system is essential to fully harnessing the potential of adaptive designs.

研究动机与目标

解决在众包系统中最小化总成本（任务分配次数）的同时达到目标可靠性水平的挑战。
设计一种在工人不可靠且易逝、无法识别或重用的情况下仍有效的任务分配与推断方案。
比较非自适应与自适应任务分配策略，以确定动态分配是否在渐近成本上具有优势。
设计一种基于估计可靠性对工人响应加权的推断算法，以提升准确性，超越简单的多数投票。
通过理论证明，建立最优性：即使与已知所有工人可靠性的理想者相比，所提方法的性能也仅相差一个常数因子。

提出的方法

使用概率模型形式化众包问题，其中每个工人具有与任务无关的可靠性参数，并以随机方式出错。
利用低秩矩阵逼近建模工人对任务的响应矩阵，以估计潜在的真实标签和工人可靠性。
应用信念传播，基于响应之间的交叉一致性，迭代更新对任务标签和工人可靠性的信念。
设计一种非自适应任务分配策略，以批量方式将任务分配给工人，无需事先知晓响应结果，以优化成本与可靠性。
利用集中不等式和詹森不等式，推导出实现目标误差率所需工人数的下限。
证明在工人短暂不可复用的假设下，所提算法的成本缩放与理论最小值一致，即使与自适应策略相比亦如此。

实验结果

研究问题

RQ1在不可靠且易逝的工人存在的众包系统中，非自适应任务分配策略是否能以常数因子内接近最优自适应策略的成本性能？
RQ2信念传播与低秩矩阵逼近在存在工人噪声的情况下，能在多大程度上提升推断准确性，超越多数投票？
RQ3是否存在一个根本性的成本缩放极限，使得即使在已知工人可靠性的情况下，自适应任务分配也无法突破？
RQ4所提算法的性能与已知所有工人真实可靠性的理想者相比如何？
RQ5工人可靠性估计在最小化达到目标误差率所需总任务分配数方面起什么作用？

主要发现

所提算法在成本效率方面显著优于多数投票，以更少的任务分配实现更高的可靠性。
该算法具有阶最优性：即使与已知工人可靠性的理想者相比，其所需分配次数也仅多出一个常数因子。
出人意料的是，自适应任务分配并未改善渐近成本缩放；在自适应与非自适应情况下，最小成本的缩放方式相同。
成本缩放的根本限制源于工人的易逝性与不可复用性，这使得工人声誉系统对于自适应设计的有效性至关重要。
在 $\hat{\ell}\hat{r}q^2 = 1$ 处观察到相变，低于该值时，任何算法都无法超越多数投票，表明存在根本的信息论障碍。
通过詹森不等式与切尔诺夫不等式推导出的理论下界确认，所需工人数随目标误差率呈对数缩放，其常数因子取决于工人质量 $q$。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。