QUICK REVIEW

[论文解读] $k$-Universality of Regular Languages

Duncan Adamson, Pamela Fleischmann|arXiv (Cornell University)|Jan 1, 2023

semigroups and automata theory被引用 1

一句话总结

本文引入并研究了正则语言中两种 k-通用性的概念：k-∃-子序列通用（存在性）与 k-∀-子序列通用（通用性），针对字母表大小参数化提供了 FPT 算法，可在字母表较小时以多项式时间决定这两种性质。核心贡献是提出一种基于 FPT 的高效框架，用于计数和排序有限自动机中的 k-子序列通用词/路径，其运行时间与 k 无关，且在字母表较小时关于状态数为多项式时间。

ABSTRACT

A subsequence of a word w is a word u such that u = w[i₁] w[i₂] … w[i_k], for some set of indices 1 ≤ i₁ < i₂ < … < i_k ≤ |w|. A word w is k-subsequence universal over an alphabet Σ if every word in Σ^k appears in w as a subsequence. In this paper, we study the intersection between the set of k-subsequence universal words over some alphabet Σ and regular languages over Σ. We call a regular language L k-∃-subsequence universal if there exists a k-subsequence universal word in L, and k-∀-subsequence universal if every word of L is k-subsequence universal. We give algorithms solving the problems of deciding if a given regular language, represented by a finite automaton recognising it, is k-∃-subsequence universal and, respectively, if it is k-∀-subsequence universal, for a given k. The algorithms are FPT w.r.t. the size of the input alphabet, and their run-time does not depend on k; they run in polynomial time in the number n of states of the input automaton when the size of the input alphabet is O(log n). Moreover, we show that the problem of deciding if a given regular language is k-∃-subsequence universal is NP-complete, when the language is over a large alphabet. Further, we provide algorithms for counting the number of k-subsequence universal words (paths) accepted by a given deterministic (respectively, nondeterministic) finite automaton, and ranking an input word (path) within the set of k-subsequence universal words accepted by a given finite automaton.

研究动机与目标

形式化并分析正则语言中两种新的 k-通用性概念：k-∃-子序列通用（存在性）与 k-∀-子序列通用（通用性）。
为判断由有限自动机表示的正则语言是否为 k-∃-或 k-∀-子序列通用，开发高效的决策算法。
提供一个计算工具箱，用于计数和排序由确定性或非确定性有限自动机接受的 k-子序列通用词或路径。
研究 k-∃-子序列通用性问题的复杂性，证明当字母表较大时该问题为 NP-完全。
设计运行时间与 k 无关的算法，在输入字母表大小为 O(log n) 时达到多项式时间。

提出的方法

将 k-子序列通用词定义为包含字母表 Σ 上所有长度为 k 的词作为子序列的词。
引入两种语言级概念：k-∃-子序列通用（L 中存在某个词是 k-通用的）与 k-∀-子序列通用（L 中所有词都是 k-通用的）。
构建动态规划表 T(PR) 和 U(PR)，以追踪路径前缀上的可达性与子序列覆盖情况。
采用基于状态的动态规划，参数包括：当前状态、路径长度、已见不同符号的数量以及符号集合，以编码子序列通用性。
利用以字母表大小 σ 为参数的 FPT 框架，确保运行时间依赖于 σ 和 n（状态数），但不依赖于 k。
通过前缀表和对接受状态的求和，实现字典序排名，计算所有字典序小于给定词的 k-通用词数量。

实验结果

研究问题

RQ1我们能否在 FPT 时间内判断一个正则语言是否为 k-∃-子序列通用，给定其有限自动机表示？
RQ2判断一个正则语言是否为 k-∀-子序列通用的复杂性如何？能否高效完成？
RQ3当字母表较大时，k-∃-子序列通用性问题是否为 NP-完全？
RQ4我们能否高效计算给定有限自动机所接受的 k-子序列通用词或路径的数量？
RQ5如何计算给定 k-通用词在其所接受的所有 m-长度 k-通用词中的字典序排名？

主要发现

当输入字母表较大时，判断 k-∃-子序列通用性的问题为 NP-完全，即使对于正则语言也是如此。
用于判断 k-∃-和 k-∀-子序列通用性的算法在字母表大小 σ 上为 FPT，且当 σ = O(log n) 时在 n（状态数）上为多项式时间。
对于长度为 m 的路径，由确定性或非确定性有限自动机接受的 k-子序列通用词（或路径）的数量可在 O*(m²n²k²σ) 时间内计算。
给定一个长度为 m 的 k-通用词 w，其在所有 m-长度 k-通用词中的字典序排名可在 O*(m²n²k²σ) 时间内计算。
对于长度不超过 m 的词，其排名可在 O*(m²n²k²σ) 时间内计算；对于语言中所有词，其排名可在 O*(n⁴k³σ) 时间内计算。
该框架支持确定性和非确定性有限自动机，且对词和路径的计数结果均成立。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。