QUICK REVIEW

[Paper Review] Knowledge Graph Embedding for Link Prediction: A Comparative Analysis

Andrea Rossi, Donatella Firmani|arXiv (Cornell University)|Feb 3, 2020

Advanced Graph Neural Networks70 references141 citations

TL;DR

The paper conducts a comprehensive, from-scratch comparison of 16 KG embedding-based link prediction models across popular benchmarks, highlighting methodological differences and evaluation practices.

ABSTRACT

Knowledge Graphs (KGs) have found many applications in industry and academic settings, which in turn, have motivated considerable research efforts towards large-scale information extraction from a variety of sources. Despite such efforts, it is well known that even state-of-the-art KGs suffer from incompleteness. Link Prediction (LP), the task of predicting missing facts among entities already a KG, is a promising and widely studied task aimed at addressing KG incompleteness. Among the recent LP techniques, those based on KG embeddings have achieved very promising performances in some benchmarks. Despite the fast growing literature in the subject, insufficient attention has been paid to the effect of the various design choices in those methods. Moreover, the standard practice in this area is to report accuracy by aggregating over a large number of test facts in which some entities are over-represented; this allows LP methods to exhibit good performance by just attending to structural properties that include such entities, while ignoring the remaining majority of the KG. This analysis provides a comprehensive comparison of embedding-based LP methods, extending the dimensions of analysis beyond what is commonly available in the literature. We experimentally compare effectiveness and efficiency of 16 state-of-the-art methods, consider a rule-based baseline, and report detailed analysis over the most popular benchmarks in the literature.

Motivation & Objective

Motivate and address incompleteness in knowledge graphs by evaluating link prediction via embeddings.
Provide a large-scale, fair comparative analysis beyond aggregated test accuracy.
Detail how design choices across architectures affect LP performance on common benchmarks.
Propose informative evaluation practices and share publicly available datasets, code, and results.

Proposed method

Train and tune 16 embedding-based LP models plus a rule-based baseline from scratch.
Compare diverse architectures: tensor decomposition, geometric, and deep learning models.
Evaluate on the 5 most commonly used LP datasets with standard metrics.
Define structural training data features and measure their impact on predictive performance.
Provide per-prediction ranks and CSV outputs for deeper analysis.
Share code and resources via a public GitHub repository.

Experimental results

Research questions

RQ1Which KG embedding models provide the best trade-off between effectiveness and efficiency on standard LP benchmarks?
RQ2How do design choices (tensor vs geometric vs deep learning, bilinear vs non-bilinear, translational vs rotational) impact LP performance?
RQ3How do dataset characteristics influence model performance and what factors predict easy vs. hard predictions?
RQ4Do current evaluation practices accurately reflect model capabilities across the KG?

Key findings

16 state-of-the-art models were experimentally compared across 5 datasets.
The study provides detailed results beyond original papers, including efficiency and effectiveness per model and dataset.
A set of structural features in training data was defined to assess their effect on model performance.
Results include per-prediction ranks and complete prediction lists for transparency.
Datasets, code, and resources are publicly available on GitHub.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.