QUICK REVIEW

[论文解读] Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning

Ke Liang, Sihang Zhou|arXiv (Cornell University)|Jul 6, 2023

Advanced Graph Neural Networks被引用 14

一句话总结

SGMPT 是首个多模态预训练 Transformer 模型，用于知识图谱推理，明确利用图结构信息通过结构编码器和结构引导融合模块。它在 FB15k-237-IMG 和 WN18-IMG 上提升多模态 KGR 性能。

ABSTRACT

Multimodal knowledge graphs (MKGs), which intuitively organize information in various modalities, can benefit multiple practical downstream tasks, such as recommendation systems, and visual question answering. However, most MKGs are still far from complete, which motivates the flourishing of MKG reasoning models. Recently, with the development of general artificial architectures, the pretrained transformer models have drawn increasing attention, especially for multimodal scenarios. However, the research of multimodal pretrained transformer (MPT) for knowledge graph reasoning (KGR) is still at an early stage. As the biggest difference between MKG and other multimodal data, the rich structural information underlying the MKG still cannot be fully leveraged in existing MPT models. Most of them only utilize the graph structure as a retrieval map for matching images and texts connected with the same entity. This manner hinders their reasoning performances. To this end, we propose the graph Structure Guided Multimodal Pretrained Transformer for knowledge graph reasoning, termed SGMPT. Specifically, the graph structure encoder is adopted for structural feature encoding. Then, a structure-guided fusion module with two different strategies, i.e., weighted summation and alignment constraint, is first designed to inject the structural information into both the textual and visual features. To the best of our knowledge, SGMPT is the first MPT model for multimodal KGR, which mines the structural information underlying the knowledge graph. Extensive experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT outperforms existing state-of-the-art models, and prove the effectiveness of the designed strategies.

研究动机与目标

通过利用底层图结构来解决不完整的多模态知识图谱（MKGs），以实现多模态 KGR。
设计一个即插即用的结构引导模块，将结构信息注入现有的多模态预训练 Transformer。
证明将图结构纳入可以提高基准 MKGR 数据集的推理性能。

提出的方法

采用图结构编码器为实体生成结构嵌入。
引入一个结构引导融合模块，提供两种策略：加权求和和对齐约束，将结构与文本和视觉模态融合。
以 MKGformer 作为 MPT 主干，HAKE（及其变体）作为结构编码器来生成 H^s，并通过 L_ts、L_vs 和 L_a 损失与 H^t、H^v 对齐。
使用 MLM 基预训练和微调目标进行训练，结合交叉熵损失和对齐损失。
在 FB15k-237-IMG 和 WN18-IMG 数据集上使用 Hits@k 和 MR 进行评估。

实验结果

研究问题

RQ1SGMPT 是否优于包括基于 Transformer 的现有最先进多模态 KGR 模型？
RQ2结构编码器和结构引导融合模块在利用图结构进行 MKGR 方面是否有效？
RQ3不同结构编码器对 MKGR 性能的影响是什么？
RQ4所提方法的效率与灵敏度特性如何？

主要发现

模型	FB15k-237-IMG MR	FB15k-237-IMG Hits@1	FB15k-237-IMG Hits@3	FB15k-237-IMG Hits@10	WN18-IMG MR	WN18-IMG Hits@1	WN18-IMG Hits@3	WN18-IMG Hits@10
TransE	323	19.8	37.6	44.1	357	4.0	74.5	92.3
DisMult	512	19.9	30.1	44.6	665	33.5	87.6	94.0
ComplEx	546	19.4	29.7	45.0	-	93.6	94.5	94.7
ConvE	249	22.5	34.1	49.7	-	41.9	47.0	53.1
RGCN	600	10.0	18.1	30.0	-	8.0	13.7	20.7
IKRL(UNION)	298	19.4	28.4	45.8	596	12.7	79.6	92.8
TransAE	431	19.9	31.7	46.3	352	32.3	83.5	93.4
RSME(ViT-B/32+Forget)	417	24.2	34.4	46.7	-	94.3	95.1	-
KG-BERT	153	-	-	-	58	11.7	68.9	92.6
VisualBERT	592	21.7	32.4	43.9	122	17.9	43.7	65.4
ViLBERT	483	23.3	33.5	45.7	131	22.3	55.2	76.1
MKGformer	252	24.3	36.0	49.9	25	93.5	95.8	97.0
SGMPT	238	25.2	37.0	51.0	29	94.3	96.6	97.8

SGMPT 在基准数据集上超越所有非 Transformer 的 KGR 模型。
SGMPT 在大多数 Transformer KGR 模型上取得更高的 Hits@1、Hits@3 和 Hits@10，特别是在 FB15k-237-IMG 上。
消融研究表明加权求和和对齐约束都对性能有贡献，文本-结构和视觉-结构融合带来增益。
使用 HAKE 作为结构编码器获得较强结果，其他编码器如 HousE 和 COMPGCN 也被评估。
在 FB15k-237-IMG 上，SGMPT MR=238，Hits@1=25.2，Hits@3=37.0，Hits@10=51.0；在 WN18-IMG 上，MR=29，Hits@1=94.3，Hits@3=96.6，Hits@10=97.8。
与 MKGformer 相比，SGMPT 在 Hits@1/3/10 上表现出显著提升，同时保持 MR 的竞争力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。