QUICK REVIEW

[论文解读] KBGAN: Adversarial Learning for Knowledge Graph Embeddings

Liwei Cai, William Yang Wang|arXiv (Cornell University)|Nov 11, 2017

Advanced Graph Neural Networks参考文献 22被引用 44

一句话总结

KBGAN 提出了一种新颖的对抗性学习框架，通过使用基于概率的模型作为生成器来生成高质量的负样本训练数据，同时使用基于距离的模型作为判别器。该方法在无需外部本体的情况下，显著提升了多个数据集上的链接预测性能。

ABSTRACT

We introduce KBGAN, an adversarial learning framework to improve the performances of a wide range of existing knowledge graph embedding models. Because knowledge graphs typically only contain positive facts, sampling useful negative training examples is a non-trivial task. Replacing the head or tail entity of a fact with a uniformly randomly selected entity is a conventional method for generating negative facts, but the majority of the generated negative facts can be easily discriminated from positive facts, and will contribute little towards the training. Inspired by generative adversarial networks (GANs), we use one knowledge graph embedding model as a negative sample generator to assist the training of our desired model, which acts as the discriminator in GANs. This framework is independent of the concrete form of generator and discriminator, and therefore can utilize a wide variety of knowledge graph embedding models as its building blocks. In experiments, we adversarially train two translation-based models, TransE and TransD, each with assistance from one of the two probability-based models, DistMult and ComplEx. We evaluate the performances of KBGAN on the link prediction task, using three knowledge base completion datasets: FB15k-237, WN18 and WN18RR. Experimental results show that adversarial training substantially improves the performances of target embedding models under various settings.

研究动机与目标

为解决知识图谱嵌入中负样本质量差的问题，即均匀随机采样的负样本往往与正样本极易区分。
通过使用更优的负样本进行对抗性训练，提升现有知识图谱嵌入模型的泛化能力和性能。
设计一种通用且与模型无关的框架，可无缝集成任意知识图谱嵌入模型作为生成器或判别器，且无需外部约束。
通过应用具有方差减少的一步强化学习方法，实现在离散生成设置中的梯度反向传播。

提出的方法

KBGAN 采用受生成对抗网络（GAN）启发的框架，其中一个 KGE 模型作为生成器，用于生成负事实，另一个作为判别器，用于区分真实负样本与生成负样本。
生成器为基于概率的模型（如 DistMult 或 ComplEx），根据学习到的嵌入对潜在负三元组进行打分。
判别器为基于边缘损失的模型（如 TransE 或 TransD），学习将真实事实与生成负样本区分开来。
采用具有方差减少的一步 REINFORCE 算法，实现通过生成器的离散采样步骤进行梯度反向传播。
该框架采用端到端训练，随着训练进行，生成器逐步生成更具真实感的负样本，以挑战判别器。
超参数（如每个正样本生成的负样本数量 Ns）经过调优，以在训练稳定性和性能之间取得平衡。

实验结果

研究问题

RQ1与均匀随机采样相比，使用学习型生成器进行对抗性训练能否生成质量更高的负样本？
RQ2所提出的 KBGAN 框架是否能在不同数据集上持续提升多种知识图谱嵌入模型的性能？
RQ3生成负样本的质量如何影响判别器在链接预测任务上的泛化能力？
RQ4该框架是否可应用于多种 KGE 模型，而无需进行架构修改或依赖外部本体？

主要发现

KBGAN 在所有三个基准数据集 FB15k-237、WN18 和 WN18RR 上，均一致提升了 TransE 和 TransD 的性能。
对抗性训练设置在所有设置下均显著提升了平均倒数排名（MRR）和 hits@10，即使在 WN18RR 等具有挑战性的数据集上也观察到性能提升。
模型生成的负样本在语义上比均匀随机负样本更具相关性，定性案例研究显示，实体间表现出微弱但合理的语义关系。
训练过程在验证集上表现出稳定且单调的性能提升，表明尽管 GAN 天生存在不稳定性，但模型仍能收敛。
即使使用较简单的模型（如 TransE 和 TransD）作为判别器，该框架依然有效，表明其与现有 KGE 架构具有广泛的兼容性。
采用一步 REINFORCE 方法实现了对离散生成器的有效梯度反向传播，使端到端训练成为可能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。