QUICK REVIEW

[论文解读] Preprocessing Ambiguous Imprecise Points

Ivor van der Hoog, Irina Kostitsyna|arXiv (Cornell University)|Jan 1, 2019

Data Management and Algorithms参考文献 8被引用 2

一句话总结

本文引入了模糊度 A(R) 作为不确定点集中区域重叠的精细化度量，支持在 O(A(R)) 时间内进行预处理，以实现排序和四叉树构建。证明了 A(R) 是邻近结构重建时间的紧下界，适用于区间排序和 d 维单位圆盘。

ABSTRACT

Let ${R} = \{R_1, R_2, ..., R_n\}$ be a set of regions and let $ X = \{x_1, x_2, ..., x_n\}$ be an (unknown) point set with $x_i \in R_i$. Region $R_i$ represents the uncertainty region of $x_i$. We consider the following question: how fast can we establish order if we are allowed to preprocess the regions in $R$? The preprocessing model of uncertainty uses two consecutive phases: a preprocessing phase which has access only to ${R}$ followed by a reconstruction phase during which a desired structure on $X$ is computed. Recent results in this model parametrize the reconstruction time by the ply of ${R}$, which is the maximum overlap between the regions in ${R}$. We introduce the ambiguity $A({R})$ as a more fine-grained measure of the degree of overlap in ${R}$. We show how to preprocess a set of $d$-dimensional disks in $O(n \log n)$ time such that we can sort $X$ (if $d=1$) and reconstruct a quadtree on $X$ (if $d\geq 1$ but constant) in $O(A({R}))$ time. If $A({R})$ is sub-linear, then reporting the result dominates the running time of the reconstruction phase. However, we can still return a suitable data structure representing the result in $O(A({R}))$ time. In one dimension, ${R}$ is a set of intervals and the ambiguity is linked to interval entropy, which in turn relates to the well-studied problem of sorting under partial information. The number of comparisons necessary to find the linear order underlying a poset $P$ is lower-bounded by the graph entropy of $P$. We show that if $P$ is an interval order, then the ambiguity provides a constant-factor approximation of the graph entropy. This gives a lower bound of $Ω(A({R}))$ in all dimensions for the reconstruction phase (sorting or any proximity structure), independent of any preprocessing; hence our result is tight.

研究动机与目标

为解决 ply 作为不确定点集中区域重叠粗略度量的局限性。
开发一种更精细的度量——模糊度 A(R)，以更准确地捕捉重叠程度。
证明 A(R) 是邻近结构重建时间的紧下界。
展示在 O(A(R)) 时间内实现排序和四叉树构建的高效预处理。
将模糊度与区间熵及偏序关系的线性扩展数量联系起来，为部分信息下的排序提供新界。

提出的方法

将模糊度 A(R) 定义为所有排列中，与之前区域交集数量之和的最小值。
使用一种包含兼容的排列 π 来引导动态平衡 Fibonacci 树 E 的构建。
通过按 π 顺序插入点，构建一个 2-形变四叉树 T，利用叶指针和锚点实现高效定位。
应用一种动态树结构，在插入过程中通过 O(A(R)) 次操作维持平衡。
利用每个区域在四叉树中至多与 O(|Γπ_i|) 个叶节点相交的事实，其中 |Γπ_i| 表示与区域 Ri 重叠的先前区域数量。
证明通过邻居指针和锚点，四叉树内的点定位操作可在 O(log |Γπ_i|) 时间内完成。

实验结果

研究问题

RQ1模糊度 A(R) 是否能作为不确定点集中区域重叠的更精确度量，优于 ply？
RQ2A(R) 是否是重建邻近结构（如四叉树）的时间复杂度的紧下界？
RQ3模糊度与区间熵及偏序关系中线性扩展数量之间有何关系？
RQ4是否能以 O(n log n) 时间完成预处理，从而在 O(A(R)) 时间内实现排序和四叉树的重建？
RQ5A(R) 是否能被高效近似，且是否能改进现有熵计算的界？

主要发现

模糊度 A(R) 是区间图熵的常数因子近似，为排序和邻近结构重建提供了 Ω(A(R)) 的下界。
在完成 O(n log n) 预处理后，可通过 Θ(A(R)) 时间实现由区间表示的不精确点的排序。
通过所提预处理方法，可在 Θ(A(R)) 时间内重建 d 维单位圆盘的 2-形变四叉树。
A(R) 的 3-近似可在 O(n log n) 时间内计算，使其在实践中具有高效可用性。
该方法将区间图熵近似状态的最先进复杂度从 O(n^2.5) 提升至 O(n log n)。
该方法可推广至常数维下的单位大小胖凸区域，同时保持 O(A(R)) 的重建时间。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。