QUICK REVIEW

[논문 리뷰] Computing and Enumerating Minimal Common Supersequences Between Two Strings

Braeden Sopp, Adiesha Liyanage|arXiv (Cornell University)|2026. 03. 23.

Algorithms and Data Compression인용 수 0

한 줄 요약

본 논문은 두 문자열의 최소 공통 상위 수열을 선형 시간으로 계산하는 알고리즘과, 이 최소 공통 상위 수열들을 이차 공간, 선형 시간 지연, 삼차 시간 전처리로 열거하는 구조를 제시한다.

ABSTRACT

Given $k$ strings each of length at most $n$, computing the shortest common supersequence of them is a well-known NP-hard problem (when $k$ is unbounded). On the other hand, when $k=2$, such a shortest common supersequence can be computed in $O(n^2)$ time using dynamic programming as a textbook example. In this paper, we consider the problem of computing a \emph{minimal} common supersequence and enumerating all minimal common supersequences for $k=2$ input strings. Our results are summarized as follows. A minimal common supersequence of $k=2$ input strings can be computed in $O(n)$ time. (The method also works when $k$ is a constant). All minimal common supersequences between two input strings can be enumerated with a data structure of $O(n^2)$ space and an $O(n)$ time delay, and the data structure can be constructed in $O(n^3)$ time.

연구 동기 및 목표

LCS/SCS 문제의 보완으로써 MCS의 효율적 계산에 동기를 부여한다.
두 문자열(k=2)에 대한 MCS를 선형 시간으로 계산하는 방법을 제시하고 이를 다중 문자열으로 확장하는 인사이트를 제공한다.
검증 가능한 시간/공간 보장을 갖는 모든 MCS를 열거하는 열거 프레임워크를 개발한다.

제안 방법

공통 상위 수열을 스윕하고 essentiality 기준(Lemma 3.1)을 사용하여 제거 가능한 인덱스를 삭제하여 두 문자열에 대한 MCS를 O(n) 시간에 얻는다.
A와 B가 상위 수열에 어떻게 내포되는지 추적하기 위한 오른쪽 임베딩을 구성한다(BuildRightEmbedding).
필수 인덱스(Lem 및 Rem 임베딩)를 통해 MCS의 특성을 증명하여 삭제를 안내하고 최소성을 보장한다.
선형 시간에 비필수 위치를 제거하여 MCS를 출력하는 구성적 알고리즘(ReduceSupersequence)을 제공한다.
접근법을 k 문자열로 확장하여 k 문자열에 대한 MCS를 O(kn(log k + log n)) 시간에 계산한다.
A와 B를 블록으로 분할하고 레이블이 있는 이분 그래프 G(A,B)의 경로로 MCS를 모델링하는 열거 프레임워크를 개발하며, s-t 경로가 MCS들에 일대일 대응한다.

Figure 1: Depicted on the left is $G(A,B)$ containing all nodes with the vertices of $G_{st}(A,B)$ colored in terms of their respective partitions. Red vertices belong partition $V_{A}$ and blue vertices belong to partition $V_{B}$ . On the right we have $G_{st}(A,B)$ with the vertices labeled. $A=b

실험 결과

연구 질문

RQ1두 문자열의 최소 공통 상위 수열을 n에 의존하지 않고 선형 시간에 계산할 수 있는가?
RQ2두 문자열 사이의 모든 최소 공통 상위 수열을 어떻게 효율적으로 열거할 수 있는가?
RQ3열거를 위해 MCS와 가능한 경로 사이의 일대일 대응을 가능하게 하는 데이터 구조와 분할 방법은 무엇인가?
RQ4두 문자열을 넘어서는 확장이 MCS 계산의 시간 복잡도에 어떤 영향을 미치는가?
RQ5구조적 특성(임베딩, 필수 인덱스)이 MCS 구성 및 열거의 최소성 및 정확성을 보장하는가?

주요 결과

두 입력 문자열에 대한 MCS는 O(n) 시간에 계산할 수 있다.
두 문자열 사이의 모든 MCS는 O(n^2) 공간과 O(n) 시간 지연으로 열거할 수 있으며, 열거를 위한 데이터 구조는 O(n^3) 시간에 구성할 수 있다.
k 문자열에 대해서는 MCS를 O(kn(log k + log n)) 시간에 계산할 수 있다.
임의의 공통 상위 수열에서 최소한의 것으로의 선형 시간 환원 방법은 효율적인 MCS 계산을 가능하게 한다.
열거 프레임워크는 G(A,B) 그래프를 사용하며, s-t 경로가 MCS와 일대일 대응한다.
본 논문은 필수 인덱스와 구간 분할(정리 5.1, 5.2, 5.3)을 통해 MCS의 정확한 특성을 확립한다.

Figure 2: On the left hand side, we show characters used in a right embedding of $A_{1}=abbc$ into $S=abccbacc$ in orange cells. On the right, we depict the output data structure for $S$ and $A_{1}$ . On the bottom is the result of $\textsc{MergeRightEmbedding}(S;A_{1},A_{2})$ where $A_{2}=ac$ .

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.