Skip to main content
QUICK REVIEW

[论文解读] Distributionally balanced sampling designs

Anton Grafström, Wilmer Prentius|arXiv (Cornell University)|Mar 12, 2026
Optimal Experimental Design Methods被引用 0
一句话总结

引入 Distributionally Balanced Designs (DBD),一种通过圆形排序和随机连续区块选择的概率抽样类,最小化样本与总体辅助分布之间的能距,从而改善分布代表性与估计量方差。

ABSTRACT

We propose Distributionally Balanced Designs (DBD), a new class of probability sampling designs that target representativeness at the level of the full auxiliary distribution rather than selected moments. In disciplines such as ecology, forestry, and environmental sciences, where field data collection is expensive, maximizing the information extracted from a limited sample is critical. More precisely, DBD can be viewed as minimum discrepancy designs that minimize the expected discrepancy between the sample and population auxiliary distributions. The key idea is to construct samples whose empirical auxiliary distribution closely matches that of the population. We present a first implementation of DBD based on an optimized circular ordering of the population, combined with random selection of a contiguous block of units. The ordering is chosen to minimize the design-expected energy distance, a discrepancy measure that captures differences between distributions beyond low-order moments. This criterion promotes strong spatial spread, and yields low variance for Horvitz-Thompson estimators of totals of functions that vary smoothly with respect to auxiliaries. Simulation results show that approximate DBD achieves better distributional fit than state-of-the-art methods such as the local pivotal and local cube designs. Hence, DBD can improve the reliability of estimates from costly field data, making distributional balancing effective for constructing representative surveys in resource-constrained applications.

研究动机与目标

  • 推动需要超越均值或空间分布的全样本代表性。
  • 提出一个正式框架(DBD),以最小化样本与总体之间的分布差异。
  • 开发基于优化的构造方法(圆形排序 + 连续区块)以近似分布平衡。
  • 提供方差估计指南并通过仿真与真实数据评估性能。
  • 提供可扩展的实现指导并讨论超越传统调查抽样的适用性。

提出的方法

  • 将 Distributionally Balanced Designs (DBD) 定义为最小化样本与总体辅助分布之间的期望能距的设计。
  • 采纳能距(是一种最大均值距离的形式)作为衡量分歧的量度,以捕捉所有矩。
  • 将设计类限制为由圆形置换和随机起点形成的等概率设计。
  • 使用模拟退火来优化总体的圆形排序,以最小化样本-总体能距的平均值。
  • 利用每次交换的快速 O(n) 更新来实现目标函数评估,从而实现高效优化。
  • 提供适用于高度分散样本的方差估计的局部均值估计量。

实验结果

研究问题

  • RQ1如何构建抽样设计,使样本的辅助分布与总体分布高度一致?
  • RQ2在光滑目标函数下,针对能距的分布拟合优化是否能改善 Horvitz-Thompson 估计量的方差性质?
  • RQ3在分布拟合、空间分布和局部平衡方面,DBD 与前沿方法(LPM、LCUBE、SRS)在不同维度的辅助变量下有何比较?
  • RQ4圆形 DBD 能否扩展到更大规模的人群,区块/分层版本是否能保持方差降低?

主要发现

DimensionsMethodE (mean energy distance)SB (mean spatial balance)LB (mean local balance)BD (balance deviation)
2SRS0.00990.33750.145949.79
2LPM0.00150.08790.076910.50
2LCUBE0.00130.08250.07517.97
2DBD0.00100.06120.06464.88
5SRS0.01670.25180.183184.38
5LPM0.00690.13420.146436.50
5LCUBE0.00530.12650.142915.07
5DBD0.00460.11570.139112.44
10SRS0.02410.34930.2739122.96
10LPM0.01450.27680.256674.54
10LCUBE0.01040.27020.255125.79
10DBD0.00960.26290.252923.41
20SRS0.03430.56510.4329175.59
20LPM0.02520.51510.4242129.13
20LCUBE0.01710.51790.423945.15
20DBD0.01670.51580.423341.76
  • DBD 在不同维度下的分布拟合优于局部枢轴和局部立方设计(能距均值较低)。
  • 优化后的圆形排序在保持等概率的前提下产生强烈的空间扩展性。
  • 与竞争设计相比,DBD 在平衡相关指标(LB 与 BD)方面表现更优,尤其在低维度时。
  • 局部均值方差估计量能适应目标函数的平滑性结构,且在 DBD 下稳定。
  • 随着样本量增加,DBD 的分布优势叠加,对平衡偏差的衰减速度快于 SRS。
  • 在真实数据 Meuse 上,圆形 DBD 提供最低的能距,并对辅助变量和目标变量的估计更为准确,覆盖区间更具保守性。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。