QUICK REVIEW

[论文解读] Computing and Enumerating Minimal Common Supersequences Between Two Strings

Braeden Sopp, Adiesha Liyanage|arXiv (Cornell University)|Mar 23, 2026

Algorithms and Data Compression被引用 0

一句话总结

本文提出一种线性时间算法来计算两个字符串的最小公共超序列（MCS），并给出一个结构以在二次空间、线性时间延迟和三次时间预处理的条件下枚举所有最小公共超序列，具有可证明的时间/空间特性。

ABSTRACT

Given $k$ strings each of length at most $n$, computing the shortest common supersequence of them is a well-known NP-hard problem (when $k$ is unbounded). On the other hand, when $k=2$, such a shortest common supersequence can be computed in $O(n^2)$ time using dynamic programming as a textbook example. In this paper, we consider the problem of computing a \emph{minimal} common supersequence and enumerating all minimal common supersequences for $k=2$ input strings. Our results are summarized as follows. A minimal common supersequence of $k=2$ input strings can be computed in $O(n)$ time. (The method also works when $k$ is a constant). All minimal common supersequences between two input strings can be enumerated with a data structure of $O(n^2)$ space and an $O(n)$ time delay, and the data structure can be constructed in $O(n^3)$ time.

研究动机与目标

推动高效计算最小公共超序列（MCS）作为对 LCS/SCS 问题的补充。
提供一个线性时间的方法来计算两个字符串（k=2）的 MCS，并将见解扩展到多字符串。
开发一个枚举框架，列出所有 MCS，具备可证明的时间/空间保证。

提出的方法

通过遍历一个公共超序列并使用本质性准则（引理 3.1）删除可删除的索引，以在两个字符串的情况下在 O(n) 时间内获得 MCS。
构建右嵌入以跟踪 A 和 B 如何嵌入到超序列中（BuildRightEmbedding）。
通过本质索引（Lem 与 Rem 嵌入）证明 MCS 的性质，以指导删除并确保最小性。
给出一个构造性算法（ReduceSupersequence），通过删除非本质位置在线性时间内输出一个 MCS。
将该方法扩展到 k 字符串，在 O(kn(log k + log n)) 时间内计算 k 字符串的 MCS。
开发一个枚举框架，将 A 和 B 分割为块，并将 MCS 建模为带标记的双向图 G(A,B) 的路径，其中 st-路径与 MCS 一一对应。

Figure 1: Depicted on the left is $G(A,B)$ containing all nodes with the vertices of $G_{st}(A,B)$ colored in terms of their respective partitions. Red vertices belong partition $V_{A}$ and blue vertices belong to partition $V_{B}$ . On the right we have $G_{st}(A,B)$ with the vertices labeled. $A=b

实验结果

研究问题

RQ1是否可以在与 n 无关的线性时间内计算两个字符串的最小公共超序列？
RQ2如何高效地枚举两个字符串之间的所有最小公共超序列？
RQ3哪种数据结构和划分方法能够实现 MCS 与可行路径之间的双射以用于枚举？
RQ4扩展到多于两个字符串时，计算 MCS 的时间复杂度如何变化？
RQ5保证 MCS 构造与枚举的最小性和正确性涉及的结构性质（嵌入、本质索引）有哪些？

主要发现

可以在 O(n) 时间内计算两个输入字符串的 MCS。
所有 MCS 之间的枚举可以在 O(n^2) 空间和 O(n) 时间延迟下完成，用于枚举的数据结构可以在 O(n^3) 时间构建。
对于 k 字符串，可以在 O(kn(log k + log n)) 时间内计算一个 MCS。
从任意公共超序列到最小一个的线性时间归约方法使高效计算 MCS 成为可能。
枚举框架使用图 G(A,B)，其中 st-路径与 MCS 一一对应。
论文对通过本质索引和区间划分对 MCS 进行了精确表征（定理 5.1、5.2、5.3）。

Figure 2: On the left hand side, we show characters used in a right embedding of $A_{1}=abbc$ into $S=abccbacc$ in orange cells. On the right, we depict the output data structure for $S$ and $A_{1}$ . On the bottom is the result of $\textsc{MergeRightEmbedding}(S;A_{1},A_{2})$ where $A_{2}=ac$ .

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。