Skip to main content
QUICK REVIEW

[Paper Review] QUERY2BOX: REASONING OVER KNOWLEDGE GRAPHS IN VECTOR SPACE USING BOX EMBEDDINGS

Hongyu Ren|arXiv (Cornell University)|Feb 14, 2020
Advanced Graph Neural Networks31 references79 citations
TL;DR

Query2box represents queries as boxes in vector space to handle arbitrary EPFO queries over large, incomplete KGs, achieving up to 25% relative improvement over baselines by transforming EPFO queries into DNFs of conjunctive queries and aggregating results.

ABSTRACT

Answering complex logical queries on large-scale incomplete knowledge graphs (KGs) is a fundamental yet challenging task. Recently, a promising approach to this problem has been to embed KG entities as well as the query into a vector space such that entities that answer the query are embedded close to the query. However, prior work models queries as single points in the vector space, which is problematic because a complex query represents a potentially large set of its answer entities, but it is unclear how such a set can be represented as a single point. Furthermore, prior work can only handle queries that use conjunctions ($\wedge$) and existential quantifiers ($\exists$). Handling queries with logical disjunctions ($\vee$) remains an open problem. Here we propose query2box, an embedding-based framework for reasoning over arbitrary queries with $\wedge$, $\vee$, and $\exists$ operators in massive and incomplete KGs. Our main insight is that queries can be embedded as boxes (i.e., hyper-rectangles), where a set of points inside the box corresponds to a set of answer entities of the query. We show that conjunctions can be naturally represented as intersections of boxes and also prove a negative result that handling disjunctions would require embedding with dimension proportional to the number of KG entities. However, we show that by transforming queries into a Disjunctive Normal Form, query2box is capable of handling arbitrary logical queries with $\wedge$, $\vee$, $\exists$ in a scalable manner. We demonstrate the effectiveness of query2box on three large KGs and show that query2box achieves up to 25% relative improvement over the state of the art.

Motivation & Objective

  • Motivate scalable reasoning over large, incomplete KGs using expressive EPFO queries.
  • Model sets of answer entities with box embeddings instead of single points.
  • Enable disjunction handling by transforming EPFO queries into DNFs of conjunctive queries.
  • Train learnable geometric operators for projection and intersection over boxes.
  • Demonstrate strong generalization and missing-relations imputation capabilities.

Proposed method

  • Embed KG entities as points in R^d and represent queries as axis-aligned boxes in R^d.
  • Associate each relation with a box-embedded operator that translates and enlarges the query box (projection).
  • Define a geometric intersection operator that shrinks and centers boxes using attention over input boxes.
  • Model distance between an entity and a query as dist_box combining dist_outside and dist_inside with a weighting alpha.
  • Train via a negative sampling objective that pushes answer entities inside the final box and negatives outside.
  • Handle disjunction by transforming EPFO queries to Disjunctive Normal Form and aggregating results from conjunctive subqueries.

Experimental results

Research questions

  • RQ1Can EPFO queries be effectively answered in a low-dimensional box-embedding space?
  • RQ2How can conjunction, disjunction, and existential operators be represented as geometric box operations?
  • RQ3Does transforming EPFO queries into DNFs enable scalable reasoning over large, incomplete KGs?
  • RQ4How does query2box perform relative to state-of-the-art baselines on complex query structures?
  • RQ5Can the model generalize to unseen query structures and impute missing relations?

Key findings

DatasetAvg1p2p3p2i3iippi2uup
FB15kq2b0.4840.7860.4130.3030.5930.7120.2110.3970.6080.330
FB15k-237q2b0.2680.4670.2400.1860.3240.4530.1080.2050.2390.193
NELL995q2b0.3060.5550.2660.2330.3430.4800.1320.2120.3690.163
  • Query2box achieves up to 25% relative improvement over baselines on EPFO query answering tasks.
  • Box embeddings enable natural modeling of query-answer sets and closed operations via projection and intersection.
  • DNF transformation allows scalable handling of disjunction without requiring exponential dimensionality.
  • Strong generalization observed to unseen query structures and implicit missing-relations imputation.
  • Experiments on FB15k, FB15k-237, and NELL995 show superior performance for complex query structures over gqe baselines.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.