QUICK REVIEW

[論文レビュー] Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning

Ke Liang, Sihang Zhou|arXiv (Cornell University)|Jul 6, 2023

Advanced Graph Neural Networks被引用数 14

ひとこと要約

SGMPTは、グラフ構造情報を構造エンコーダと構造誘導融合モジュールを介して明示的に活用する、知識グラフ推論のための最初のマルチモーダル事前学習Transformerモデルです。FB15k-237-IMGおよびWN18-IMGでマルチモーダルKGRの性能を向上させます。

ABSTRACT

Multimodal knowledge graphs (MKGs), which intuitively organize information in various modalities, can benefit multiple practical downstream tasks, such as recommendation systems, and visual question answering. However, most MKGs are still far from complete, which motivates the flourishing of MKG reasoning models. Recently, with the development of general artificial architectures, the pretrained transformer models have drawn increasing attention, especially for multimodal scenarios. However, the research of multimodal pretrained transformer (MPT) for knowledge graph reasoning (KGR) is still at an early stage. As the biggest difference between MKG and other multimodal data, the rich structural information underlying the MKG still cannot be fully leveraged in existing MPT models. Most of them only utilize the graph structure as a retrieval map for matching images and texts connected with the same entity. This manner hinders their reasoning performances. To this end, we propose the graph Structure Guided Multimodal Pretrained Transformer for knowledge graph reasoning, termed SGMPT. Specifically, the graph structure encoder is adopted for structural feature encoding. Then, a structure-guided fusion module with two different strategies, i.e., weighted summation and alignment constraint, is first designed to inject the structural information into both the textual and visual features. To the best of our knowledge, SGMPT is the first MPT model for multimodal KGR, which mines the structural information underlying the knowledge graph. Extensive experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT outperforms existing state-of-the-art models, and prove the effectiveness of the designed strategies.

研究の動機と目的

Incomplete multimodal knowledge graphs (MKGs)を、基盤となるグラフ構造を活用してマルチモーダルKGRで解決・補完する。
既存のマルチモーダル事前学習Transformersに構造情報を注入するプラグアンドプレイ型の構造誘導モジュールを設計する。
グラフ構造を組み込むことで推論性能が改善されることを、ベンチマークMKGRデータセットで実証する。

提案手法

グラフ構造エンコーダを採用して、エンティティの構造埋め込みを生成する。
構造誘導融合モジュールを導入し、二つの戦略（重み付き和と整合性制約）で構造をテキスト派と視覚派と融合する。
MK GformerをMPTバックボーンとして採用し、HAKE（およびその派生）を構造エンコーダとして用い、H^sを生成し、L_ts、L_vs、L_a損失を通じてH^tおよびH^vと整合させる。
MLMベースの事前学習とファインチューニング objectiveで訓練し、クロスエントロピー損失と整合損失を組み合わせる。
FB15k-237-IMGとWN18-IMGデータセットでHits@kとMean Rank(MR)を用いて評価する。

実験結果

リサーチクエスチョン

RQ1SGMPTは、従来の最先端のマルチモーダルKGRモデル（Transformerベースを含む）を上回るのか。
RQ2構造エンコーダと構造誘導融合モジュールは、MKGRにおけるグラフ構造の活用に有効か。
RQ3異なる構造エンコーダがMKGR性能に与える影響は何か。
RQ4提案手法の効率性と感度特性はどうか。

主な発見

モデル	FB15k-237-IMG MR	FB15k-237-IMG Hits@1	FB15k-237-IMG Hits@3	FB15k-237-IMG Hits@10	WN18-IMG MR	WN18-IMG Hits@1	WN18-IMG Hits@3	WN18-IMG Hits@10
TransE	323	19.8	37.6	44.1	357	4.0	74.5	92.3
DisMult	512	19.9	30.1	44.6	665	33.5	87.6	94.0
ComplEx	546	19.4	29.7	45.0	-	93.6	94.5	94.7
ConvE	249	22.5	34.1	49.7	-	41.9	47.0	53.1
RGCN	600	10.0	18.1	30.0	-	8.0	13.7	20.7
IKRL(UNION)	298	19.4	28.4	45.8	596	12.7	79.6	92.8
TransAE	431	19.9	31.7	46.3	352	32.3	83.5	93.4
RSME(ViT-B/32+Forget)	417	24.2	34.4	46.7	-	94.3	95.1	-
KG-BERT	153	-	-	-	58	11.7	68.9	92.6
VisualBERT	592	21.7	32.4	43.9	122	17.9	43.7	65.4
ViLBERT	483	23.3	33.5	45.7	131	22.3	55.2	76.1
MKGformer	252	24.3	36.0	49.9	25	93.5	95.8	97.0
SGMPT	238	25.2	37.0	51.0	29	94.3	96.6	97.8

SGMPTはベンチマークデータセットで、非Transformer型KGRモデルを全て上回る。
SGMPTは大多数のTransformer型KGRモデルよりもHits@1、Hits@3、Hits@10が高く、特にFB15k-237-IMGで顕著。
重み付き和と整合性制約の双方が性能向上に寄与することがアブレーションで示され、テキスト・構造と視覚・構造の融合が効果を発揮。
HAKEを構造エンコーダとして用いると強い結果が得られ、他のエンコーダとしてHousEやCOMPGCNも評価済み。
FB15k-237-IMGではSGMPT MR=238、Hits@1=25.2、Hits@3=37.0、Hits@10=51.0；WN18-IMGではMR=29、Hits@1=94.3、Hits@3=96.6、Hits@10=97.8。
MKGformerと比較して、SGMPTはHits@1/3/10で顕著な改善を示しつつ、MRの競争力を維持している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。