QUICK REVIEW

[論文レビュー] Embedding Logical Queries on Knowledge Graphs

William L. Hamilton, Payal Bajaj|arXiv (Cornell University)|Jun 5, 2018

Advanced Graph Neural Networks被引用数 62

ひとこと要約

本論文は Graph Query Embedding (GQE) を提案します。欠落した知識グラフ上の共役論理クエリに対して、幾何投影と交差を低次元空間で学習する埋め込みベースの枠組みを用い、効率的なクエリ評価を可能にします。

ABSTRACT

Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities. However, an open challenge in this area is developing techniques that can go beyond simple edge prediction and handle more complex logical queries, which might involve multiple unobserved edges, entities, and variables. For instance, given an incomplete biological knowledge graph, we might want to predict "em what drugs are likely to target proteins involved with both diseases X and Y?" -- a query that requires reasoning about all possible proteins that {\em might} interact with diseases X and Y. Here we introduce a framework to efficiently make predictions about conjunctive logical queries -- a flexible but tractable subset of first-order logic -- on incomplete knowledge graphs. In our approach, we embed graph nodes in a low-dimensional space and represent logical operators as learned geometric operations (e.g., translation, rotation) in this embedding space. By performing logical operations within a low-dimensional embedding space, our approach achieves a time complexity that is linear in the number of query variables, compared to the exponential complexity required by a naive enumeration-based approach. We demonstrate the utility of this framework in two application studies on real-world datasets with millions of relations: predicting logical relationships in a network of drug-gene-disease interactions and in a graph-based representation of social interactions derived from a popular web forum.

研究の動機と目的

不完全なKGsに対する単一エッジ予測を超える複雑な結合クエリへの回答を動機づける。
結合クエリを低次元 embeddings へ写像する、スケーラブルな埋め込みベースの枠組みを開発する。
幾何演算がクエリのデノテーションを近似でき、効率的な推論を支援できることを示す。
大規模な実世界データセット（生物医薬品-遺伝子-疾患ネットワークと Reddit の相互作用）で評価する。
複雑なクエリでの学習がエッジのみの学習より性能を向上させることを示す。

提案手法

学習可能なノード埋め込みを用いて、グラフのノードを d 次元空間に埋め込む。
論理的エッジを、 q を P(q, tau) = R_tau q に写像する幾何投影演算子 P で表現する。
複数のクエリ埋め込みを交差埋め込みに集約する幾何的交差演算子 I を導入する。
アルゴリズム 1 を用いて、アンカーノードと変数を横断する P および I の DAG 構造的伝搬によってクエリ埋め込みを計算する。
クエリ q に対してノード v を、コサイン類似度 score(q, z_v) で評価する。
正例と負例を用いた最大マージン損失で P, I, およびノード埋め込みを訓練する。交差クエリには hard negative を用いる。
埋め込み空間で最近傍探索（例えば locality-sensitive hashing）を用いて効率的な推論を行う。）

実験結果

リサーチクエスチョン

RQ1欠落した知識グラフ上の結合グラフクエリは、埋め込みを介して効率的に解答できるか。
RQ2学習された幾何演算子の小さなセットが、存在量化されたクエリのデノテーションを忠実に表現できるか。
RQ3複雑なクエリに対して、GQE は列挙ベースのエッジ予測ベースラインとどのように比較されるか？
RQ4複雑なクエリでの訓練は、エッジレベルの訓練を超えて下流の性能を向上させるか？
RQ5million-edge の実世界データセットにおける GQE のスケーラビリティはどれほどか？

主な発見

GQE は Bio および Reddit データセットで強力な予測性能を達成し、Bilinear GQE が最も良い性能を示す（Bio AUC 91.0; Reddit AUC 76.4）。
変数に境界のない設定制約なし変数設定では、GQE は列挙ベースのベースラインを上回る。
複雑なクエリでの訓練は、エッジのみの訓練と比べてAUCを大幅に改善（データセット間平均で約13%、p<0.001）。
クエリ埋め込みはクエリエッジ数に対して線形時間計算量を可能にし、最近傍探索による推論はサブ線形で行われる。
本フレームワークは DistMult および TransE 投影を用いたバリアントをサポートし、試験された構成の中で Bilinear がしばし最良の結果を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。