QUICK REVIEW

[论文解读] A Simple Algorithm for Consistent Query Answering Under Primary Keys

Diego Figueira, Anantha Padmanabha|arXiv (Cornell University)|Jan 1, 2023

Advanced Database Systems and Queries被引用 2

一句话总结

本文提出了一种针对主键约束下一致查询回答（CQA）的简单膨胀不动点算法。该算法通过迭代扩展大小不超过查询大小k的事实集合，高效判断布尔性合取查询在不一致数据库上是否为确定的；对于无自连接和路径查询，该算法在多项式时间内正确识别确定性，且其有界性当且仅当查询确定性为一阶可定义。

ABSTRACT

We consider the dichotomy conjecture for consistent query answering under primary key constraints. It states that, for every fixed Boolean conjunctive query q, testing whether q is certain (i.e. whether it evaluates to true over all repairs of a given inconsistent database) is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set $Δ$ of subsets of facts of the database of size at most k, where k is the size of the query q. The algorithm runs in polynomial time and can be formally defined as: (1) Initialize $Δ$ with all sets $S$ of at most $k$ facts such that $S\models q$. (2) Add any set $S$ of at most k facts to $Δ$ if there exists a block $B$ (i.e., a maximal set of facts sharing the same key) such that for every fact $a \in B$ there is a set $S' \subseteq S \cup \{a\}$ such that $S'\in Δ$. For an input database $D$, the algorithm answers "q is certain" iff $Δ$ eventually contains the empty set. The algorithm correctly computes certainty when the query q falls in the polynomial time cases of the known dichotomies for self-join-free queries and path queries. For arbitrary Boolean conjunctive queries, the algorithm is an under-approximation: the query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.

研究动机与目标

为解决主键约束下一致查询回答的开放二分猜想，该猜想认为该问题要么属于P，要么是coNP完全的。
设计一种简单高效的算法，正确计算可 tractable 的合取查询类别的确定答案。
刻画该算法的下近似何时变为精确，特别是与一阶可定义性的关系。
将算法扩展至非布尔查询和含常量的查询，同时保持多项式时间复杂度。
提供一个统一框架，通过冲突超图推广至其他约束类型。

提出的方法

初始化集合∆，包含所有大小不超过k的事实子集S，使得S |= q，其中k为查询q的大小。
通过迭代扩展∆：若存在某个块B（共享相同主键的事实），使得B中每个事实a都存在一个超集S′ ⊆ S ∪ {a}已存在于∆中，则将任意大小不超过k的事实集合S加入∆。
当无法再添加新集合时，算法终止；当且仅当空集∅ ∈ ∆时，查询为确定的。
该算法在多项式时间内运行，并被形式化为一种膨胀不动点过程。
对于非布尔查询，算法检查对于每个候选答案¯a，∆(q, D, ¯a)是否包含∅。
该算法可通过固定常量的解释进行调整，当不再存在自由变量时，将迭代次数减少至一次。

实验结果

研究问题

RQ1对于哪些合取查询类，所提出的不动点算法能正确计算所有确定答案？
RQ2在何种条件下，该算法的下近似是精确的，即何时能正确识别所有确定查询？
RQ3合取查询的确定性在何时为一阶可定义，这与不动点算法的有界性有何关系？
RQ4该算法能否推广至主键以外的其他类型完整性约束？
RQ5是否可通过该算法的简化变体，实现自连接无查询的更低计算复杂度（如LogSpace）？

主要发现

该算法对所有在已知二分猜想下可 tractable 的无自连接合取查询正确计算确定性，即满足PCond条件的查询。
对于路径查询，当查询可 tractable 时，该算法能正确识别所有确定查询，且算法的有界性与查询确定性的一阶可定义性完全对应。
对于一般合取查询，该算法为下近似：若输出为“确定”，则查询确实为确定的，但可能遗漏某些含自连接的多项式时间确定查询。
不动点算法有界（即在与数据库大小无关的有限步内终止）当且仅当查询的确定性为一阶可定义。
该算法可扩展至非布尔查询和含常量的查询，同时在数据复杂度下保持多项式时间复杂度。
冲突超图的推广表明，该算法的结构可自然推广至其他约束类型，如否定约束或键约束。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。