[论文解读] On the Complexity of BWT-Runs Minimization via Alphabet Reordering
本文通過字母表重排確立了Burrows-Wheeler變換(BWT)中運行次數最小化的計算複雜度,證明決策問題為NP完全問題,且在指數時間假設下無法在亞指數時間內求解。此外,本文表明優化問題為APX難問題,揭示了BWT運行次數與圖中旅行商路徑之間的驚人聯繫,同時為僅出現一次符號的約束變體提供了一種線性時間演算法。
The Burrows-Wheeler Transform (BWT) has been an essential tool in text compression and indexing. First introduced in 1994, it went on to provide the backbone for the first encoding of the classic suffix tree data structure in space close to the entropy-based lower bound. Recently, there has been the development of compact suffix trees in space proportional to "$r$", the number of runs in the BWT, as well as the appearance of $r$ in the time complexity of new algorithms. Unlike other popular measures of compression, the parameter $r$ is sensitive to the lexicographic ordering given to the text's alphabet. Despite several past attempts to exploit this, a provably efficient algorithm for finding, or approximating, an alphabet ordering which minimizes $r$ has been open for years. We present the first set of results on the computational complexity of minimizing BWT-runs via alphabet reordering. We prove that the decision version of this problem is NP-complete and cannot be solved in time $2^{o(σ+ \sqrt{n})}$ unless the Exponential Time Hypothesis fails, where $σ$ is the size of the alphabet and $n$ is the length of the text. We also show that the optimization problem is APX-hard. In doing so, we relate two previously disparate topics: the optimal traveling salesperson path and the number of runs in the BWT of a text, providing a surprising connection between problems on graphs and text compression. Also, by relating recent results in the field of dictionary compression, we illustrate that an arbitrary alphabet ordering provides a $O(\log^2 n)$-approximation. We provide an optimal linear-time algorithm for the problem of finding a run minimizing ordering on a subset of symbols (occurring only once) under ordering constraints, and prove a generalization of this problem to a class of graphs with BWT like properties called Wheeler graphs is NP-complete.
研究动机与目标
- 確定透過字母表重排最小化BWT中運行次數的計算複雜度。
- 理解為何儘管長期受到關注,此問題的高效演算法仍難以實現。
- 為運行次數最小化問題建立難度結果(NP完全性與APX難度)。
- 探討BWT運行次數與圖論問題(特別是旅行商路徑)之間的關聯。
- 為僅涉及僅出現一次符號的問題受限版本設計多項式時間演算法。
提出的方法
- 透過從旅行商路徑問題的一種變體歸約,證明BWT運行次數最小化決策版本的NP完全性。
- 透過引入差距的歸約,證明優化問題的不可逼近性,從而確立其為APX難問題。
- 揭示BWT運行次數與構造圖中最佳TSP路徑長度之間的驚人關聯。
- 為「受限字母表排序」(CAO)問題設計一種線性時間貪心演算法,其中僅出現一次的符號在固定區塊約束下被重排。
- 利用BWT結構中的「區塊」與「元組」概念,將問題建模為需排序的符號集合序列,以最大化相鄰匹配。
- 利用最長公共延伸(LCE)資料結構在線性時間內識別區塊邊界,從而實現高效的元組構建。
实验结果
研究问题
- RQ1透過字母表重排最小化BWT運行次數的問題是否為NP完全問題?
- RQ2BWT運行次數最小化問題能否在常數因子內近似,還是其為APX難問題?
- RQ3BWT運行次數與相關圖中最佳TSP路徑長度之間是否存在結構性關聯?
- RQ4能否為僅重排僅出現一次符號的受限版本設計多項式時間演算法?
- RQ5具有任意符號放置約束的通用受限字母表排序問題的計算複雜度為何?
主要发现
- 透過字母表重排最小化BWT運行次數的決策問題為NP完全問題。
- 優化問題為APX難問題,意味著除非P = NP,否則不存在多項式時間近似方案。
- 除非指數時間假設失敗,否則該問題無法在時間 2^o(σ + √n) 內求解。
- 建立了BWT運行次數與衍生圖中最佳TSP路徑長度之間的驚人關聯。
- 針對僅出現一次符號在固定區塊約束下重排的「受限字母表排序」(CAO)問題,提出了一種最佳線性時間演算法。
- 任意字母表排序可為一般運行次數最小化問題提供 O(log²n)-近似比。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。