QUICK REVIEW

[논문 리뷰] Enumerating Regular Languages with Bounded Delay

Antoine Amarilli, Mikaël Monet|arXiv (Cornell University)|2022. 09. 29.

semigroups and automata theory인용 수 2

한 줄 요약

이 논문은 정규 언어에 대한 유한 지연 순서화 알고리즘을 제안하며, 이전 단어를 양끝에서의 push/pop 연산으로 수정하는 편집 스크립트를 사용하여, 단어 길이에 관계없이 일정한 지연을 보장한다. 이는 정규 언어 중에서 '희박한(sparse)' 언어, 즉 유한한 무한 경로를 가지는 언어들만 순서 가능(orderable)으로 특성화되며, 임의의 정규 언어를 최소 수의 순서 가능 컴포넌트로 분할하는 PTIME 알고리즘을 제공한다. 또한 최종적으로 주기적인 편집 스크립트를 통해 순서화 시킬 때 최적의 지연 한계를 확립한다.

ABSTRACT

We study the task, for a given language L, of enumerating the (generally infinite) sequence of its words, without repetitions, while bounding the delay between two consecutive words. To allow for delay bounds that do not depend on the current word length, we assume a model where we produce each word by editing the preceding word with a small edit script, rather than writing out the word from scratch. In particular, this witnesses that the language is orderable, i.e., we can write its words as an infinite sequence such that the Levenshtein edit distance between any two consecutive words is bounded by a value that depends only on the language. For instance, (a+b)^* is orderable (with a variant of the Gray code), but a^* + b^* is not. We characterize which regular languages are enumerable in this sense, and show that this can be decided in PTIME in an input deterministic finite automaton (DFA) for the language. In fact, we show that, given a DFA A, we can compute in PTIME automata A₁, …, A_t such that L(A) is partitioned as L(A₁) ⊔ … ⊔ L(A_t) and every L(A_i) is orderable in this sense. Further, we show that the value of t obtained is optimal, i.e., we cannot partition L(A) into less than t orderable languages. In the case where L(A) is orderable (i.e., t = 1), we show that the ordering can be produced by a bounded-delay algorithm: specifically, the algorithm runs in a suitable pointer machine model, and produces a sequence of bounded-length edit scripts to visit the words of L(A) without repetitions, with bounded delay - exponential in |A| - between each script. In fact, we show that we can achieve this while only allowing the edit operations push and pop at the beginning and end of the word, which implies that the word can in fact be maintained in a double-ended queue. By contrast, when fixing the distance bound d between consecutive words and the number of classes of the partition, it is NP-hard in the input DFA A to decide if L(A) is orderable in this sense, already for finite languages. Last, we study the model where push-pop edits are only allowed at the end of the word, corresponding to a case where the word is maintained on a stack. We show that these operations are strictly weaker and that the slender languages are precisely those that can be partitioned into finitely many languages that are orderable in this sense. For the slender languages, we can again characterize the minimal number of languages in the partition, and achieve bounded-delay enumeration.

연구 동기 및 목표

무한한 정규 언어를 단어 길이에 관계없이 유한 지연으로 효율적으로 순서화하는 데 도전한다.
연속된 단어가 유한한 Levenshtein 편집 거리로 다를 수 있는 '순서 가능' 정규 언어를 정의하고 특성화한다.
모든 정규 언어가 유한 개의 순서 가능 하위언어로 분할될 수 있는지, 그리고 그러한 분할의 최소 수는 무엇인지 규명한다.
단어의 끝에서만 push/pop 연산을 사용하는 유한 지연 순서화 알고리즘을 설계한다. 이는 이중 끝 큐에서 유지 가능하다.
고정된 편집 거리 및 고정된 분할 크기 조건 하에서 순서 가능성을 결정하는 복잡도 한계를 설정한다. 이는 NP-난이도를 가진다.

제안 방법

단어 간 전환을 위해 길이가 유한한 편집 스크립트의 순서로 순서화를 모델링하며, 이는 단어의 양끝에서만 push/pop 연산을 사용한다.
지연이 단어 길이가 아닌 자동기계 크기에만 의존하도록 보장하기 위해 포인터 머신 모델을 사용해 순서화 과정을 시뮬레이션한다.
비순환 가능한 단어 순서화를 위해 결정성 유한 자동기계(DFA)에서 깊이 우선 탐색(DFS)을 적용하며, 사이클을 피하기 위해 상태 표시를 사용한다.
DFS 시 재귀 스택 추적을 통해 선형 시간 내에 유일한 단순 사이클과 초기 상태에서 사이클로 향하는 경로를 식별한다.
최종적으로 주기적인 편집 스크립트 시퀀스를 구성한다: 먼저 비순환 경로의 단어를 순서화하고, 사이클으로 전이한 후 주기적으로 사이클을 순환한다.
모든 편집 연산에서 각 상태가 최대 두 번만 나타나도록 보장함으로써, 연속된 단어 간 편집 거리를 2k(= 상태 수) 이내로 제한한다.

실험 결과

연구 질문

RQ1어떤 정규 언어가 연속된 단어 간 Levenshtein 편집 거리가 유한한 무한 단어 시퀀스를 가질 수 있는가?
RQ2모든 정규 언어가 유한 개의 순서 가능 하위언어로 분할될 수 있으며, 그러한 컴포넌트의 최소 수는 얼마인가?
RQ3단어의 끝에서만 편집 연산을 사용하는 것으로 정규 언어의 유한 지연 순서화가 가능할 수 있으며, 그 조건은 무엇인가?
RQ4고정된 편집 거리 d와 고정된 클래스 수 t 조건 하에서 정규 언어의 순서 가능성 여부를 결정하는 계산 복잡도는 무엇인가?
RQ5최소 분할을 효율적으로 계산할 수 있는 방법은 무엇이며, 이러한 언어를 특성화하는 구조적 성질(예: 희박성)은 무엇인가?

주요 결과

정규 언어가 순서 가능하다(즉, 연속된 단어 간 편집 거리가 유한한 무한 시퀀스를 가짐)는 것과 그 언어가 희박하다(sparse)는 것(즉, 자동기계에서 유한한 무한 경로를 가짐)은 동치이다.
정규 언어를 순서 가능한 컴포넌트로 분할할 수 있는 최소 수 t는 DFA 내 상호 비교 불가능한 무한 경로의 수와 같으며, 이는 PTIME 내에서 계산 가능하다.
1-희박 언어(즉, t = 1)의 경우, 모든 단어를 지연이 DFA 크기의 선형 함수에 비례하고 편집 거리가 최대 2k인 최종적으로 주기적인 편집 스크립트 시퀀스로 순서화할 수 있다.
유한 지연 순서화 알고리즘은 이중 끝 큐를 사용해 현재 단어를 유지하며, 단어의 양끝에서만 push와 pop 연산을 수행한다.
고정된 편집 거리 d와 고정된 클래스 수 t 조건 하에서 정규 언어가 분할 순서 가능함을 결정하는 것은 NP-난이도를 가진다. 이는 유한 언어에 대해서도 마찬가지이다.
편집 연산을 단어의 끝에서만 허용하는 스택 모델으로 제한할 경우, 순서 가능 언어의 클래스는 더 약해지며, 정확히 희박 언어에 해당하며, 동일한 최소 분할 수와 유한 지연 순서화가 달성 가능하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.