QUICK REVIEW

[論文レビュー] Dynamic Longest Common Substring in Polylogarithmic Time

Panagiotis Charalampopoulos, Paweł Gawrychowski|arXiv (Cornell University)|Jan 1, 2020

Algorithms and Data Compression参考文献 31被引用数 3

ひとこと要約

本稿では、編集操作を受ける2つの文字列の最長共通部分文字列（LCS）を維持する動的アルゴリズムを提示し、高確率で均等化更新時間O(log⁷n)を達成する。この手法は、ラベル付き二色葉を備えた動的木構造と、文字列パース整合性の新規応用を活用しており、従来の˜O(n²/³)の境界を著しく改善し、任意の多項式サイズのデータ構造に対して、Ω(log n / log log n)の近似的最適な下界を確立する。

ABSTRACT

The longest common substring problem consists in finding a longest string that appears as a (contiguous) substring of two input strings. We consider the dynamic variant of this problem, in which we are to maintain two dynamic strings $S$ and $T$, each of length at most $n$, that undergo substitutions of letters, in order to be able to return a longest common substring after each substitution. Recently, Amir et al. [ESA 2019] presented a solution for this problem that needs only $ ilde{\mathcal{O}}(n^{2/3})$ time per update. This brought the challenge of determining whether there exists a faster solution with polylogarithmic update time, or (as is the case for other dynamic problems), we should expect a polynomial (conditional) lower bound. We answer this question by designing a significantly faster algorithm that processes each substitution in amortized $\log^{\mathcal{O}(1)} n$ time with high probability. Our solution relies on exploiting the local consistency of the parsing of a collection of dynamic strings due to Gawrychowski et al. [SODA 2018], and on maintaining two dynamic trees with labeled bicolored leaves, so that after each update we can report a pair of nodes, one from each tree, of maximum combined weight, which have at least one common leaf-descendant of each color. We complement this with a lower bound of $Ω(\log n/ \log\log n)$ for the update time of any polynomial-size data structure that maintains the LCS of two dynamic strings, and the same lower bound for the update time of any data structure of size $ ilde{\mathcal{O}}(n)$ that maintains the LCS of a static and a dynamic string. Both lower bounds hold even allowing amortization and randomization.

研究の動機と目的

編集操作の下で最長共通部分文字列（LCS）を維持する動的データ構造を設計し、多対数時間の更新時間で実現すること。
完全動的LCS問題において、多対数時間の更新時間が達成可能かどうかという未解決の問題を解明すること。
均等化と確率的アルゴリズムを許容しても、動的LCSデータ構造に対してタイトな条件付き下界を確立すること。
動的文字列問題におけるPĂtrařscuの還元法（斜め集合の互いに素性からバタフライ到達可能性へ）を拡張し、動的文字列問題に適用すること。

提案手法

Gawrychowskiら（SODA 2018）の文字列パースの局所的整合性を活用し、動的文字列表現を維持する。
ラベル付き二色葉を備えた2つの動的木を維持し、動的文字列を効率的に表現・更新する。
サブストリングマッチングを介して、最長共通部分文字列のアクティベートおよびクエリパターンを符号化するガジェットベースの構成を用いる。
動的LCS問題を、ラベルが一致し、経路の深さがLCS長に一致するラベル付き木の葉ペアのアクティベート問題に還元する。
クレジットベースの非均等化技術を適用し、期待値の均等化バウンドを、下界解析のための最悪ケースバウンドに変換する。
PĂtrařscuの還元法（斜め集合の互いに素性からバタフライ到達可能性へ）を拡張し、条件付き下界を導出する。

実験結果

リサーチクエスチョン

RQ1完全動的最長共通部分文字列問題は、各編集操作ごとに多対数時間の更新時間で解けるか？
RQ2均等化や確率的アルゴリズムを許容しても、動的LCSに対して多対数時間の更新時間は不可能であるという条件付き下界は存在するか？
RQ3動的LCS問題は、バタフライ到達性のような既知の難問に還元可能か？これにより、タイトな複雑度バウンドを確立できるか？
RQ4ラベル付き二色葉を備えた動的木構造と文字列パース整合性の使用は、従来の手法と比べて顕著な改善をもたらすか？

主な発見

本稿では、高確率で各編集操作あたり均等化O(log⁷n)時間で最長共通部分文字列を維持する動的アルゴリズムを提示する。
提案されたアルゴリズムは、従来の˜O(n²/³)の更新時間境界を改善し、指数的スピードアップを達成する。
2つの動的文字列のLCSを維持する任意の多項式サイズのデータ構造に対して、条件付き下界Ω(log n / log log n)を確立する。
静的で1つの動的文字列を含む場合でも、この下界は均等化とラスベガス確率的アルゴリズムの下で成立する。
この下界は、PĂtrařscuの還元法（斜め集合の互いに素性からバタフライ到達可能性へ）の新規拡張を介して導出される。
結果として、標準的な複雑度仮定の下で、提案されたアルゴリズムは多対数因子の範囲でほぼ最適であることが示された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。