QUICK REVIEW

[论文解读] Signaling in Data Markets via Free Samples

Nivasini Ananthakrishnan, Alireza Fallah|arXiv (Cornell University)|Feb 18, 2026

Auction Theory and Applications被引用 0

一句话总结

论文建模买方在数据市场中利用免费数据样本来推断质量，并设计了近似最优的单一采购机制；结果显示在竞争增强的情况下，免费试用可能要么失败要么占优，取决于参数。

ABSTRACT

We study a setting in which a data buyer seeks to estimate an unknown parameter by purchasing samples from one of K data sellers. Each seller has privately known data quality (e.g., high vs. low variance) and a private per-sample cost. We consider a multi-stage game in which the first stage is a free-trial stage in which the sellers have the option of signaling data quality by offering a few samples of data for free. Buyers update their beliefs based on the sample variance of the free data and then run a procurement auction to buy data in a second stage. For the auction stage, we characterize an approximately optimal Bayesian incentive compatible mechanism: the buyer selects a single seller by minimizing a belief-adjusted virtual cost and chooses the purchased sample size as a function of posterior quality and virtual cost. For the free-trial stage, we characterize the equilibrium, taking the above mechanism as the continuation game. Free trials may fail to emerge: for some parameters, all sellers reveal zero samples. However, under sufficiently strong competition (large K), there is an equilibrium in which sellers reveal the maximum allowable number of samples; in fact, it is the unique equilibrium.

研究动机与目标

研究免费试用信号如何影响具有私有数据质量与成本的数据市场。
设计一个在免费试用后近似最优的贝叶斯诱导兼容机制用于采购数据。
刻画在不同市场竞争下免费试用阶段的均衡结果。
给出在何种条件下免费试用不出现以及何时完全披露成为均衡。
探索参数区间如何通过仿真影响信号传递和市场效率。

提出的方法

建模包含K个数据卖家和买家连续体，每个卖家具有私有质量（低/高方差）和私有单位样本成本。
引入一个两阶段博弈：免费试用阶段，卖家承诺一个免费样本量m_i（0..M），随后是采购拍卖。
利用贝叶斯规则基于观测到的样本方差对每个卖家的质量形成信念π；在信念调整成本下优化用于购买样本的贝叶斯机制。
将买方问题放宽到实数的购买样本量，求解后向下取整以获得可行且近似最优的机制。
证明最优机制为单一来源（从一个卖家购买数据），并推导Myerson风格的支付规则以确保贝叶斯激励兼容性（BIC）。
在给定继续机制的前提下分析免费试用均衡，证明当K增大时出现无信息均衡（所有m_i=0）的充要条件以及最大披露均衡（m_i=M）的充要条件。

Figure 1 : Phase diagram of symmetric equilibria across $(K,\,\sigma_{H}/\sigma_{L})$ . Each cell is colored based on the equilibrium detected in the corresponding regime of parameters. Blue: only the informative equilibrium ( $m^{*}=M$ ). Red: only the uninformative ( $m^{*}=0$ ). Green: only an in

实验结果

研究问题

RQ1在具有私有数据质量的市场中，免费试用在何时会作为均衡出现？
RQ2卖家数量K（竞争水平）如何影响信号传递和对免费样本的均衡披露？
RQ3在后验质量信念下，获得数据的近似最优机制的结构是什么？
RQ4在何种参数区间，免费样本无法提供有信息的信号，或反而促成完全披露？
RQ5在仿真中，中间均衡（0 < m_i < M）如何出现并与极端均衡共存？

主要发现

免费试用可能不出现：存在参数区间使得在近似均衡中所有卖家都披露零免费样本。
在竞争足够充分时（高K），存在唯一的近似均衡：每个卖家披露最大免费样本量M。
买方的近似最优机制为单一来源：从一个卖家购买数据，并以信念调整后的虚拟成本来选择。
对实值解进行向整数的四舍五入可得到近似最优机制，且损失有界。
数值仿真显示在同一参数下存在对称的中间披露水平均衡，以及多重均衡的共存。
免费样本可显著影响对数据质量的信念，导致在市场参数不同的情况下要么达到最大不透明，要么实现完全披露。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。