QUICK REVIEW

[论文解读] Reconciliation of RDF* and Property Graphs

Olaf Hartig|arXiv (Cornell University)|Sep 11, 2014

Graph Theory and Algorithms参考文献 1被引用 33

一句话总结

本文通过定义一个严格、与系统无关的转换框架，正式调和了属性图与RDF*，实现了两种模型之间的双向、无损转换。核心贡献在于对属性图进行了形式化定义，并通过用户指定的身份、标签和键映射，建立了明确的映射关系，从而实现了图数据库与RDF系统之间通过SPARQL和Gremlin等标准查询语言的无缝互操作性。

ABSTRACT

Both the notion of Property Graphs (PG) and the Resource Description Framework (RDF) are commonly used models for representing graph-shaped data. While there exist some system-specific solutions to convert data from one model to the other, these solutions are not entirely compatible with one another and none of them appears to be based on a formal foundation. In fact, for the PG model, there does not even exist a commonly agreed-upon formal definition. The aim of this document is to reconcile both models formally. To this end, the document proposes a formalization of the PG model and introduces well-defined transformations between PGs and RDF. As a result, the document provides a basis for the following two innovations: On one hand, by implementing the RDF-to-PG transformations defined in this document, PG-based systems can enable their users to load RDF data and make it accessible in a compatible, system-independent manner using, e.g., the graph traversal language Gremlin or the declarative graph query language Cypher. On the other hand, the PG-to-RDF transformation in this document enables RDF data management systems to support compatible, system-independent queries over the content of Property Graphs by using the standard RDF query language SPARQL. Additionally, this document represents a foundation for systematic research on relationships between the two models and between their query languages.

研究动机与目标

为解决当前属性图缺乏统一认可定义的问题，建立一个正式且标准化的基础。
实现基于属性图的系统（例如，Neo4j）与RDF系统（例如，Virtuoso、Bigdata）之间的系统无关数据交换。
支持在跨模型转换后的数据上使用标准查询语言——RDF使用SPARQL，属性图使用Gremlin或Cypher。
为系统性研究两种模型及其查询语言之间关系提供形式化基础。
通过RDF*扩展，以用户友好的方式解决RDF在表达语句级元数据（例如，确定性）方面的局限性。

提出的方法

将属性图形式化为一个元组 (V, E, src, tgt, lbl, P)，其中 V 为顶点，E 为边，P 为将属性分配给顶点和边的部分函数。
引入三种用户指定的映射：顶点身份映射（id）、边标签映射（lm）和属性键映射（km），用于将内部标识符映射到IRI或空白节点。
使用三个互不相交的集合定义属性图的RDF*表示：顶点属性（G_vp*）、边属性（G_ep*）和边三元组（G_en*），其中三元组使用RDF*来将边表示为三元组的主语。
使用值到字面量的映射（vm）将任意值转换为RDF字面量，以确保类型安全和标准化。
建立从属性唯一且边唯一的属性图到RDF*图的正式、单射转换，保留所有结构和语义信息。
提供从RDF*到属性图的逆转换，确保在相同正式约束下实现双向、无损转换。

实验结果

研究问题

RQ1如何建立属性图模型的正式、无歧义定义，以解决其当前缺乏标准化的问题？
RQ2为实现无损地将属性图转换为RDF*图并保留语义和结构，需要哪些正式映射？
RQ3如何使用标准SPARQL查询转换后的RDF*图？如何从RDF*图中重建原始属性图？
RQ4当对同一数据应用转换后，属性图查询语言（如Cypher、Gremlin）与SPARQL之间的正式关系是什么？
RQ5如何在RDF*中正式表达和查询属性图中的语句级元数据（例如，确定性、来源）？

主要发现

本文成功形式化了属性图模型，提供了精确且无歧义的定义，解决了其在不同系统中使用时的先前模糊性。
在属性唯一性和边唯一性的条件下，所提出的从属性图到RDF*的转换是无损且双射的，确保了完整的数据保真度。
该转换实现了完全的互操作性：RDF数据可被加载到属性图系统中，并通过Gremlin或Cypher查询；属性图数据在转换后也可通过SPARQL查询。
RDF*的使用使得语句级元数据（例如，边的确定性）能够以本体方式自然表达，且语义丰富、可查询，克服了标准RDF的一项主要局限。
该形式化框架支持对两种模型及其查询语言在等价性、表达能力和查询语义方面的系统性研究。
该方法具有可扩展性和可重用性，用户定义的映射（id、lm、km）可实现异构数据源之间的语义对齐。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。