Skip to main content
QUICK REVIEW

[Paper Review] DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Wenhan Xiong, Thien Hoang|arXiv (Cornell University)|Jul 20, 2017
Advanced Graph Neural Networks23 references108 citations
TL;DR

Introduces a policy-based reinforcement learning framework (DeepPath) to learn multi-hop relational paths in large knowledge graphs, guided by a reward function balancing accuracy, diversity, and efficiency. It outperforms PRA and KG embedding methods on Freebase (FB15K-237) and NELL datasets.

ABSTRACT

We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.

Motivation & Objective

  • Motivate multi-hop reasoning in large knowledge graphs and address limitations of discrete-path methods like PRA.
  • Propose a policy-based RL agent operating in a continuous embedding space to discover informative relational paths.
  • Design a reward function that jointly optimizes accuracy, diversity, and efficiency of discovered paths.
  • Demonstrate scalability and empirical superiority over PRA and embedding methods on benchmark KG datasets.

Proposed method

  • Model the KG reasoning task as an MDP with continuous state representations derived from TransE-style embeddings.
  • Use a policy network to output a probability over all relations as actions at each step.
  • Train the policy with REINFORCE and a supervised pre-training phase inspired by imitation learning (randomized BFS paths).
  • Incorporate a reward function combining global accuracy (+1 if target reached, -1 otherwise), path length-based efficiency (1/length), and diversity (-average cosine similarity with past paths).
  • Employ a bi-directional path-constrained search to verify learned reasoning formulas efficiently during evaluation.
  • Apply Adam optimization with L2 regularization for policy updates.

Experimental results

Research questions

  • RQ1Can reinforcement learning over a KG embedding space learn reliable multi-hop reasoning paths?
  • RQ2Does a reward function balancing accuracy, diversity, and efficiency improve path quality and learning efficiency compared to prior path-based methods?
  • RQ3How does the RL-based DeepPath compare to PRA and KG embedding methods on standard KG datasets in link and fact prediction tasks?
  • RQ4Do supervised pre-training and path verification via bi-directional search aid scalability and performance on large KGs?
  • RQ5Are the discovered RL paths shorter and more diverse than those produced by traditional path-ranking or embedding approaches?

Key findings

  • The RL-based DeepPath outperforms PRA and embedding methods on FB15K-237 and NELL-995 for link prediction, as measured by MAP.
  • DeepPath discovers significantly fewer but more predictive reasoning paths than PRA (e.g., average paths per task substantially reduced).
  • A combination of global accuracy, efficiency, and diversity in the reward yields better qualitative and quantitative path quality.
  • Bi-directional path verification reduces search complexity and improves robustness when evaluating learned paths.
  • Supervised pre-training substantially aids RL convergence in large action spaces and improves early success rates (succ_10) during training.
  • On fact prediction tasks, DeepPath generally outperforms embedding baselines across most relations/datasets.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.