[Paper Review] DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning
Introduces a policy-based reinforcement learning framework (DeepPath) to learn multi-hop relational paths in large knowledge graphs, guided by a reward function balancing accuracy, diversity, and efficiency. It outperforms PRA and KG embedding methods on Freebase (FB15K-237) and NELL datasets.
We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.
Motivation & Objective
- Motivate multi-hop reasoning in large knowledge graphs and address limitations of discrete-path methods like PRA.
- Propose a policy-based RL agent operating in a continuous embedding space to discover informative relational paths.
- Design a reward function that jointly optimizes accuracy, diversity, and efficiency of discovered paths.
- Demonstrate scalability and empirical superiority over PRA and embedding methods on benchmark KG datasets.
Proposed method
- Model the KG reasoning task as an MDP with continuous state representations derived from TransE-style embeddings.
- Use a policy network to output a probability over all relations as actions at each step.
- Train the policy with REINFORCE and a supervised pre-training phase inspired by imitation learning (randomized BFS paths).
- Incorporate a reward function combining global accuracy (+1 if target reached, -1 otherwise), path length-based efficiency (1/length), and diversity (-average cosine similarity with past paths).
- Employ a bi-directional path-constrained search to verify learned reasoning formulas efficiently during evaluation.
- Apply Adam optimization with L2 regularization for policy updates.
Experimental results
Research questions
- RQ1Can reinforcement learning over a KG embedding space learn reliable multi-hop reasoning paths?
- RQ2Does a reward function balancing accuracy, diversity, and efficiency improve path quality and learning efficiency compared to prior path-based methods?
- RQ3How does the RL-based DeepPath compare to PRA and KG embedding methods on standard KG datasets in link and fact prediction tasks?
- RQ4Do supervised pre-training and path verification via bi-directional search aid scalability and performance on large KGs?
- RQ5Are the discovered RL paths shorter and more diverse than those produced by traditional path-ranking or embedding approaches?
Key findings
- The RL-based DeepPath outperforms PRA and embedding methods on FB15K-237 and NELL-995 for link prediction, as measured by MAP.
- DeepPath discovers significantly fewer but more predictive reasoning paths than PRA (e.g., average paths per task substantially reduced).
- A combination of global accuracy, efficiency, and diversity in the reward yields better qualitative and quantitative path quality.
- Bi-directional path verification reduces search complexity and improves robustness when evaluating learned paths.
- Supervised pre-training substantially aids RL convergence in large action spaces and improves early success rates (succ_10) during training.
- On fact prediction tasks, DeepPath generally outperforms embedding baselines across most relations/datasets.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.