QUICK REVIEW

[论文解读] Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Zhuo Chen, Yichi Zhang|arXiv (Cornell University)|Feb 8, 2024

Advanced Graph Neural Networks被引用 28

一句话总结

对 KG 驱动的多模态学习（KG4MM）和多模态知识图（MMKG）的全面综述，调查超过 300 项工作，概述构建、任务、基准、挑战和趋势，包括大模型（LLMs）和多模态预训练。

ABSTRACT

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the semantic web community's exploration into multi-modal dimensions unlocking new avenues for innovation. In this survey, we carefully review over 300 articles, focusing on KG-aware research in two principal aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal tasks, and Multi-Modal Knowledge Graph (MM4KG), which extends KG studies into the MMKG realm. We begin by defining KGs and MMKGs, then explore their construction progress. Our review includes two primary task categories: KG-aware multi-modal learning tasks, such as Image Classification and Visual Question Answering, and intrinsic MMKG tasks like Multi-modal Knowledge Graph Completion and Entity Alignment, highlighting specific research trajectories. For most of these tasks, we provide definitions, evaluation benchmarks, and additionally outline essential insights for conducting relevant research. Finally, we discuss current challenges and identify emerging trends, such as progress in Large Language Modeling and Multi-modal Pre-training strategies. This survey aims to serve as a comprehensive reference for researchers already involved in or considering delving into KG and multi-modal learning research, offering insights into the evolving landscape of MMKG research and supporting future work.

研究动机与目标

为 Knowledge Graphs (KGs) 和 Multi-modal Knowledge Graphs (MMKGs) 定义基础概念。
调研 KG 构建进展和 MMKG 构建范式。
对 KG 驱动的多模态学习任务及内在的 MMKG 任务进行分类和分析，并提供基准和见解。
讨论 KG4MM 与 MM4KG 任务、基准、挑战与机遇，包括 LLMs 与多模态预训练。
为 KG 增强的多模态推理与跨模态知识表示的未来研究提供路线图。

提出的方法

对 2020–2023 年与 KG 与多模态学习相关的 300 篇以上文章进行系统性文献综述。
对 KG4MM 任务（理解与推理、分类、内容生成、检索、多模态预训练）进行结构化分类。
对 MMKG 任务（获取、融合、推理、MMKG 驱动任务）进行结构化分类。
对 MMKG 表示（A-MMKG 与 N-MMKG）及其构建范式进行定义。
对挑战、机遇和新兴趋势（包括 LLM 与 MMKG 的互动）进行批判性分析。
综合基准、评估指标，以及开展 KG 感知的多模态研究的实践指南。

实验结果

研究问题

RQ1KGs 和 MMKGs 的核心概念与定义是什么，它们与多模态学习之间的关系如何？
RQ2KGs 如何构建， MMKG 构建的范式是什么？
RQ3主要的 KG4MM 与 MM4KG 任务、数据集、基准和评估实践是什么？
RQ4在将 KG 与多模态学习整合方面当前的挑战与机遇有哪些，LLMs 如何影响这一格局？
RQ5哪些未来方向可以把 KG 与多模态学习统一起来，推动更强健的推理系统？

主要发现

KGs 提供符号化、结构化的知识，补充多模态感知，帮助解决长尾知识与推理任务。
MMKG 将 KG 扩展到多模态领域，通过将图像及其他模态表示为实体或属性，从而实现跨模态推理与检索。
广泛的任务涵盖理解与推理、分类、内容生成、检索和预训练，日益关注 MMKG 感知的预训练和与 LLM 的整合。
MMKG 的构建范式包括用 KG 符号标注图像、将 KG 符号对齐到图像，定位（grounding）在大规模 MMKG 中是主导方法。
该综述强调新兴趋势，如大型语言模型（large language models）和多模态预训练策略对 KG4MM 与 MM4KG 研究的影响。
文章提供了路线图，并在快速的 LLM 进展时代识别了维持 KG4MM 与 MM4KG 增长的开放挑战。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。