QUICK REVIEW

[论文解读] Optimal Sorting with Persistent Comparison Errors

Barbara Geissmann, Stefano Leucci|arXiv (Cornell University)|Apr 20, 2018

Algorithms and Data Compression被引用 4

一句话总结

本文提出了首个在持久比较错误下以 O(n log n) 时间复杂度运行的排序算法，以高概率实现最优的 O(log n) 最大错位和 O(n) 总错位。该工作引入了近似二分查找与近乎有序序列中同时插入的新技术，克服了以往超线性时间复杂度的限制，同时匹配信息论下限。

ABSTRACT

We consider the problem of sorting $n$ elements in the case of \emph{persistent} comparison errors. In this model (Braverman and Mossel, SODA'08), each comparison between two elements can be wrong with some fixed (small) probability $p$, and \emph{comparisons cannot be repeated}. Sorting perfectly in this model is impossible, and the objective is to minimize the \emph{dislocation} of each element in the output sequence, that is, the difference between its true rank and its position. Existing lower bounds for this problem show that no algorithm can guarantee, with high probability, \emph{maximum dislocation} and \emph{total dislocation} better than $Ω(\log n)$ and $Ω(n)$, respectively, regardless of its running time. In this paper, we present the first \emph{$O(n\log n)$-time} sorting algorithm that guarantees both \emph{$O(\log n)$ maximum dislocation} and \emph{$O(n)$ total dislocation} with high probability. Besides improving over the previous state-of-the art algorithms -- the best known algorithm had running time $ ilde{O}(n^{3/2})$ -- our result indicates that comparison errors do not make the problem computationally more difficult: a sequence with the best possible dislocation can be obtained in $O(n\log n)$ time and, even without comparison errors, $Ω(n\log n)$ time is necessary to guarantee such dislocation bounds. In order to achieve this optimal result, we solve two sub-problems, and the respective methods have their own merits for further application. One is how to locate a position in which to insert an element in an almost-sorted sequence having $O(\log n)$ maximum dislocation in such a way that the dislocation of the resulting sequence will still be $O(\log n)$. The other is how to simultaneously insert $m$ elements into an almost sorted sequence of $m$ different elements, such that the resulting sequence of $2m$ elements remains almost sorted.

研究动机与目标

填补在持久比较错误下最优错位界限与可实现运行时间之间的差距。
设计一种算法，以高概率同时实现最优最大错位 O(log n) 和最优总错位 O(n)。
证明持久比较错误不会使计算复杂度超过基于比较的排序的经典 Ω(n log n) 下限。
开发在错误依赖比较下对近乎有序序列进行近似秩计算和同时插入的高效子程序。

提出的方法

设计一种随机化算法，使用改进版归并排序 RiffleSort，生成最大错位为 O(log n) 的近乎有序序列。
提出一种新颖的近似查找方法，其秩估计的误差在 O(max{d, log n}) 以内，其中 d 为序列的最大错位。
采用基于采样的方法，通过计数比较中的不匹配来测试候选秩，利用切尔诺夫不等式确保高概率正确性。
使用不匹配计数策略，将 O(log n) 个元素同时插入近乎有序序列，以维持 O(log n) 的最大错位。
应用递归策略，反复选择大小为 O(log n) 的小子集进行排序，然后将其重新插入主序列，同时保持错位界限。
通过并集界合并子程序的成功概率，以实现整体高概率正确性。

实验结果

研究问题

RQ1是否存在一种 O(n log n) 时间复杂度的算法，可在持久比较错误下同时实现最优最大错位 O(log n) 和最优总错位 O(n)？
RQ2是否可能在存在持久错误的情况下执行近似二分查找，使得秩估计值与真实秩的差距在 O(log n) 以内？
RQ3是否可以同时将多个元素插入近乎有序序列，同时保持 O(log n) 的错位界限？
RQ4持久比较错误是否从根本上增加了排序的时间复杂度，使其超过经典排序的 Ω(n log n) 下限？

主要发现

所提出的算法以高概率实现 O(log n) 的最大错位和 O(n) 的总错位，与 Braverman 和 Mossel 建立的信息论下限完全一致。
该算法运行时间为 O(n log n)，优于此前实现类似错位界限的最坏时间复杂度 Õ(n^{3/2})。
在向序列中插入 O(log n) 个元素后，最大错位仍保持为 O(log n)，错位仅增加一个 O(log n) 的加法项。
由于插入 O(log n) 个元素，总错位增加了 O(log²n)，与整体的 O(n) 边界相比可忽略不计。
该算法的成功概率至少为 1 − 1/n，通过并集界组合高概率子程序实现。
结果表明，持久比较错误并未使排序问题的计算复杂度超过标准基于比较的排序，最优时间复杂度仍为 O(n log n)。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。