QUICK REVIEW

[论文解读] Efficient Parallel Output-Sensitive Edit Distance

Xiangyun Ding, Xiaojun Dong|arXiv (Cornell University)|Jan 1, 2023

Algorithms and Data Compression被引用 1

一句话总结

本文首次对并行、输出敏感的编辑距离算法进行了系统性的理论与实践研究。提出了四种新颖的并行算法——其中三种基于BFS（使用后缀数组和两种基于哈希的LCP数据结构），一种基于输出敏感的分治方法——在编辑距离 $k$ 较小时实现了亚二次工作量。BFS-Hash 和 BFS-B-Hash 变体在192个超线程上实现了最高48倍的加速，并在约10秒内处理了长度为 $10^9$ 的字符串，且 $k < 10^5$，其性能比 ParlayLib 高出数个数量级。

ABSTRACT

Given two strings $A[1..n]$ and $B[1..m]$, and a set of operations allowed to edit the strings, the edit distance between $A$ and $B$ is the minimum number of operations required to transform $A$ into $B$. Sequentially, a standard Dynamic Programming (DP) algorithm solves edit distance with $Θ(nm)$ cost. In many real-world applications, the strings to be compared are similar and have small edit distances. To achieve highly practical implementations, we focus on output-sensitive parallel edit-distance algorithms, i.e., to achieve asymptotically better cost bounds than the standard $Θ(nm)$ algorithm when the edit distance is small. We study four algorithms in the paper, including three algorithms based on Breadth-First Search (BFS) and one algorithm based on Divide-and-Conquer (DaC). Our BFS-based solution is based on the Landau-Vishkin algorithm. We implement three different data structures for the longest common prefix (LCP) queries needed in the algorithm: the classic solution using parallel suffix array, and two hash-based solutions proposed in this paper. Our DaC-based solution is inspired by the output-insensitive solution proposed by Apostolico et al., and we propose a non-trivial adaption to make it output-sensitive. All our algorithms have good theoretical guarantees, and they achieve different tradeoffs between work (total number of operations), span (longest dependence chain in the computation), and space. We test and compare our algorithms on both synthetic data and real-world data. Our BFS-based algorithms outperform the existing parallel edit-distance implementation in ParlayLib in all test cases. By comparing our algorithms, we also provide a better understanding of the choice of algorithms for different input patterns. We believe that our paper is the first systematic study in the theory and practice of parallel edit distance.

研究动机与目标

弥合编辑距离理论并行算法与实际实现之间的差距。
设计工作高效、可扩展且I/O友好的并行算法，在编辑距离 $k$ 较小时实现亚二次工作量。
在合成与真实工作负载下，评估并比较多种算法方法（BFS与分治）的性能。
为并行环境下LCP查询的数据结构（后缀数组、基于哈希的方法）提供工程洞察。
证明工作效率在并行算法设计中至关重要，尤其在核心数量有限时。

提出的方法

通过按层遍历编辑距离状态，将Landau-Vishkin基于BFS的算法适配为并行执行。
引入两种新颖的基于哈希的数据结构（BFS-Hash 与 BFS-B-Hash），以替代后缀数组用于LCP查询，提升空间效率与缓存友好性。
对Apostolico等人原始的输出不敏感分治算法进行非平凡的改造，使其具备输出敏感性，实现 $\widetilde{O}(nk)$ 工作量与多对数时间跨度。
在BFS-B-Hash中引入分块机制，将空间开销从输入大小的8倍降低至小于输入大小，仅带来轻微的时间开销。
使用并行前缀表构造方法实现LCP查询，针对共享内存并行环境进行优化，支持高并发。
使用合成数据集与真实数据集（DNA、Wikipedia、GitHub）评估四种算法在工作量、时间跨度与空间使用之间的权衡。

实验结果

研究问题

RQ1输出敏感的并行编辑距离算法是否能在实际中实现亚二次工作量与高并行性，而不仅限于理论界限？
RQ2在基于BFS的并行编辑距离中，不同LCP数据结构（后缀数组 vs. 基于哈希的方法）对性能与空间使用的影响如何？
RQ3能否使分治方法具备输出敏感性，同时保持良好的理论保证与实际性能？
RQ4在基于哈希的LCP结构中引入分块机制，对真实编辑模式下的空间效率与运行时间有何影响？
RQ5当核心数量限制在数百或数千时，工作量与时间跨度的权衡如何影响整体性能？

主要发现

BFS-Hash 与 BFS-B-Hash 算法在192个超线程上实现了最高48倍加速，可在约十秒内处理长度为 $10^9$ 且 $k < 10^5$ 的字符串。
BFS-Hash 在大输入上的运行时间比 ParlayLib 快超过100倍，而 ParlayLib 在相同时间内仅能处理至 $10^6$ 长度的输入。
BFS-B-Hash 通过分块机制将辅助空间降至输入大小以下，尽管查询中存在理论上的 $b$ 因子，但当 $b$ 从1增至64时，运行时间仅增加1.08至1.19倍。
基于分治的 DaC-SD 算法实现了 $\widetilde{O}(nk)$ 工作量与多对数时间跨度，但因工作效率不足，仍被BFS变体所超越。
当 $k$ 较小时，BFS-B-Hash 比 BFS-Hash 更快，因其预处理开销更低且缓存行为更优。
本研究证实，工作效率在并行设计中至关重要：即使并行度很高，过多的工作也无法通过增加核心数来弥补，尤其在核心数量有限时。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。