[论文解读] Revenue-Optimal Pricing for Budget-Constrained Buyers in Data Markets
该论文研究数据市场中预算受限买家的收入最大化定价,表明非线性定价在多项式时间内可解,而线性定价是 APX-hard,并给出现实可行的近似算法。
We study revenue-optimal pricing in data markets with rational, budget-constrained buyers. Such a market offers multiple datasets for sale, and buyers aim to improve the accuracy of their prediction tasks by acquiring data bundles. For each dataset, the market sets a pricing function, which maps the number of records purchased from the dataset to a non-negative price. The market's objective is to set these pricing functions to maximize total revenue, considering that buyers with quasi-linear utilities choose their bundles optimally under budget constraints. We analyze optimal pricing when each dataset's pricing function is only required to be monotone and lower-continuous. Surprisingly, even with this generality, optimal pricing has a highly structured form: it is piecewise linear and convex (PLC) and can be computed efficiently via an LP. Moreover, the total number of kinks across all pricing functions is bounded by the number of buyers. Thus, when datasets far outnumber buyers, most pricing functions are effectively linear. This motivates studying linear pricing, where each record in a dataset is priced uniformly. Although competitive equilibrium gives revenue-optimal linear prices in rivalrous markets with quasi-linear buyers, we show that revenue maximization under linear pricing in data markets is APX-hard. Hence, a striking computational dichotomy emerges: fully general (nonlinear) pricing admits a polynomial-time algorithm, while the simpler linear scheme is APX-hard. Despite the hardness, we design a 2-approximation algorithm when datasets arrive online, and a $(1-1/e)^{-1}$-approximation algorithm for the offline setting. Our framework lays the groundwork for exploring more general pricing schemes, richer utility models, and a deeper understanding of how market structure -- rivalrous versus non-rivalrous -- shapes revenue-optimal pricing.
研究动机与目标
- 对具有 m 个数据集和 n 位预算的买家集中式数据市场进行建模。
- 在单调下界连续定价下刻画最优定价函数。
- 证明最优的非线性定价为分段线性且凸(PLC),并可通过 LP 计算。
- 确立复杂性二分法:非线性定价可在多项式时间内求解,线性定价为 APX-hard。
提出的方法
- 用数据精度价值和数据集价格成本来形成买家效用。
- 证明最优的 ML C 定价函数通过凸包和 PLC 近似获得。
- 推导用于计算最优 PLC 定价的 LP,并展示结构性质(拐点数量受 n 的约束)。
- 在线性定价范畴内,收入最大化问题是 APX-hard。
- 给出近似算法:在线情况下达到 2-近似,离线情况下达到 (1-1/e)^{-1} 近似。
实验结果
研究问题
- RQ1数据市场在预算受限、准线性买家下如何定价以最大化总收入?
- RQ2在单调下界连续定价下,收入最优定价的结构形式为何?
- RQ3非线性定价能否高效计算,线性定价在复杂度与性能上如何比较?
- RQ4在线与离线设置中线性定价有哪些可行的近似?
主要发现
- 定价问题的最优解可以在多项式时间内计算。
- 最优的非线性定价为分段线性且凸(PLC),每个数据集平均最多有 n 个拐点,所有数据集的拐点总数受买家数量限制。
- 问题呈现出二分法:非线性定价是多项式时间可解,而线性定价是 APX-hard。
- 当数据集线性定价时,在线到达情形下收入最大化可达到 2 近似,离线情形下可达到 (1-1/e)^{-1} 近似。
- 存在一个闭式收入表达式:r(p) = sum_i min(b_i, sum_{j: tau_{i,j} >= p_j} p_j),反映了每位买家的贡献,且不存在买家间竞争(非竞争性)。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。