[论文解读] Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings
Query2box 将查询嵌入为向量空间中的盒子,以回答存在性正向一阶(EPFO)查询,针对大规模不完整知识图谱,使用DNF处理析取,并在基线基础上实现最高相对提升约25%。
Answering complex logical queries on large-scale incomplete knowledge graphs (KGs) is a fundamental yet challenging task. Recently, a promising approach to this problem has been to embed KG entities as well as the query into a vector space such that entities that answer the query are embedded close to the query. However, prior work models queries as single points in the vector space, which is problematic because a complex query represents a potentially large set of its answer entities, but it is unclear how such a set can be represented as a single point. Furthermore, prior work can only handle queries that use conjunctions ($\\wedge$) and existential quantifiers ($\\exists$). Handling queries with logical disjunctions ($\\vee$) remains an open problem. Here we propose query2box, an embedding-based framework for reasoning over arbitrary queries with $\\wedge$, $\\vee$, and $\\exists$ operators in massive and incomplete KGs. Our main insight is that queries can be embedded as boxes (i.e., hyper-rectangles), where a set of points inside the box corresponds to a set of answer entities of the query. We show that conjunctions can be naturally represented as intersections of boxes and also prove a negative result that handling disjunctions would require embedding with dimension proportional to the number of KG entities. However, we show that by transforming queries into a Disjunctive Normal Form, query2box is capable of handling arbitrary logical queries with $\\wedge$, $\\vee$, $\\exists$ in a scalable manner. We demonstrate the effectiveness of query2box on three large KGs and show that query2box achieves up to 25% relative improvement over the state of the art.
研究动机与目标
- 动机:需要在除了合取查询之外,回答复杂且不完整的知识图谱查询。
- 引入盒子嵌入,将查询的表示(表示域)表示为包围答案实体的区域。
- 发展基于DNF的方法,以可控地处理EPFO查询中的析取。
- 定义盒子上的几何算子(投影和交集)以及基于距离的训练目标。
- 展示在大型知识库基准上相对于最先进基线的可扩展性和精度提升。
提出的方法
- 将知识图谱实体嵌入为向量,将查询嵌入为在 R^d 中的轴对齐盒子。
- 将投影定义为盒子的平移与放大:对于关系 r,Box_p + r。
- 将交集定义为一个可学习、由注意力引导的盒子收缩,以建模集合交集。
- 使用盒外距离和盒内距离的加权混合来建模距离 dist_box(v; q),以对实体进行排序。
- 通过负采样和基于边距的损失进行训练,鼓励正确答案比负样本更接近。
- 将 EPFO 查询转换为DNF以通过求解多个合取查询并使用最小距离规则聚合结果来处理并集。
- 跨合取查询聚合距离:dist_agg(v; q) = min(dist_box(v; q^(i))) 对于所有CNF分量。
- 通过常数时间盒子操作和可并行的评估提供可扩展性,且采用基于神经网络的最近邻检索进行检索。
实验结果
研究问题
- RQ1是否可以在低维向量空间中用盒子表示并对复杂的EPFO查询进行推理?
- RQ2如何在盒子嵌入中可控地处理析取而不导致维度爆炸?
- RQ3将EPFO查询转换为DNF是否能够在大规模、不完整的KG上实现准确且可扩展的查询回答?
- RQ4与最先进基线相比,query2box在标准KG基准上提供了哪些准确性提升和泛化能力?
主要发现
- Query2box 在 EPFO 查询回答任务上相对于强基线实现最高约25%的相对提升。
- 盒子嵌入自然建模答案集合,并通过DNF在合取和析取下实现闭式组合。
- transforming EPFO queries to DNFs enables tractable reasoning in low-dimensional space while preserving expressive power.
- Query2box 展示出对未见查询结构的强泛化能力以及对缺失关系的隐式处理。
- 在 FB15k、FB15k-237 和 NELL995 上的实验证明,对复杂查询结构(如2p、3p、2i、3i、ip、pi、2u、up)相比基线具有更好表现。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。