[论文解读] A Comprehensive Survey on Automatic Knowledge Graph Construction
对自动知识图谱构建的全面综述,详细介绍获取、精炼与演化过程,以及资源、工具和未来方向。
Automatic knowledge graph construction aims to manufacture structured human knowledge. To this end, much effort has historically been spent extracting informative fact patterns from different data sources. However, more recently, research interest has shifted to acquiring conceptualized structured knowledge beyond informative data. In addition, researchers have also been exploring new ways of handling sophisticated construction tasks in diversified scenarios. Thus, there is a demand for a systematic review of paradigms to organize knowledge structures beyond data-level mentions. To meet this demand, we comprehensively survey more than 300 methods to summarize the latest developments in knowledge graph construction. A knowledge graph is built in three steps: knowledge acquisition, knowledge refinement, and knowledge evolution. The processes of knowledge acquisition are reviewed in detail, including obtaining entities with fine-grained types and their conceptual linkages to knowledge graphs; resolving coreferences; and extracting entity relationships in complex scenarios. The survey covers models for knowledge refinement, including knowledge graph completion, and knowledge fusion. Methods to handle knowledge evolution are also systematically presented, including condition knowledge acquisition, condition knowledge graph completion, and knowledge dynamic. We present the paradigms to compare the distinction among these methods along the axis of the data environment, motivation, and architecture. Additionally, we also provide briefs on accessible resources that can help readers to develop practical knowledge graph systems. The survey concludes with discussions on the challenges and possible directions for future exploration.
研究动机与目标
- 为知识图谱及其构建定义形式基础与分类。
- 在多样化数据环境中系统性评审知识获取、精炼与演化的方法。
- 总结支持KG构建的实用资源、数据集与工具。
- 分析HACE大数据环境中的挑战,并讨论KG的可解释性与演化。
- 突出知识图谱构建领域的未来方向与未解决的挑战。
提出的方法
- 使用形式定义和背景知识对知识图谱及其构建过程进行分类。
- 对获取、精炼与演化阶段的超过300种方法进行调查,以比较范式。
- 围绕数据环境、动机和架构考量组织讨论。
- 提供关于实际KG项目、数据集和构建工具的简要介绍。
- 考察噪声数据、低资源环境以及模型可解释性的方法。
- 就KG构建的未来研究方向提出建议。
实验结果
研究问题
- RQ1知识图谱及其构建过程的正式定义和组成部分有哪些?
- RQ2在不同数据环境和任务中,知识获取、精炼与演化的方法有何差异?
- RQ3哪些实际资源(数据集、工具、项目)支持自动KG构建,它们的特征是什么?
- RQ4在HACE大数据环境中,KG构建的主要挑战是什么,如何应对?
- RQ5自动KG构建出现了哪些未来方向和未解决的问题?
主要发现
- 该综述覆盖超过300种方法,并将KG构建框架为获取、精炼与演化三个阶段的过程。
- 它在数据环境、动机和体系结构维度上线图范式以比较方法。
- 它提供跨百科、语言、常识、企业、领域特定和联邦KG的实际KG项目、数据集与工具清单。
- 它讨论了对噪声、文档级和低资源数据的处理,以及可解释性和条件/时态知识图谱。
- 本文概述了KG构建在数据、模型和体系结构方面的挑战与未来方向。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。