QUICK REVIEW

[论文解读] ForkBase: An Efficient Storage Engine for Blockchain and Forkable Applications

Sheng Wang, Tien Tuan Anh Dinh|arXiv (Cornell University)|Feb 14, 2018

Advanced Data Storage Technologies参考文献 29被引用 24

一句话总结

ForkBase 是一种新型存储引擎，原生支持数据版本控制、分叉语义和防篡改特性，适用于区块链和协作应用。通过使用基于内容的版本化索引（POS-Tree）和细粒度去重机制，它在查询效率、存储开销和开发简洁性方面均优于现有最先进系统，在区块链、维基和分析工作负载中表现出高性能。

ABSTRACT

Existing data storage systems offer a wide range of functionalities to accommodate an equally diverse range of applications. However, new classes of applications have emerged, e.g., blockchain and collaborative analytics, featuring data versioning, fork semantics, tamper-evidence or any combination thereof. They present new opportunities for storage systems to efficiently support such applications by embedding the above requirements into the storage. In this paper, we present ForkBase, a storage engine specifically designed to provide efficient support for blockchain and forkable applications. By integrating the core application properties into the storage, ForkBase not only delivers high performance but also reduces development effort. Data in ForkBase is multi-versioned, and each version uniquely identifies the data content and its history. Two variants of fork semantics are supported in ForkBase to facilitate any collaboration workflows. A novel index structure is introduced to efficiently identify and eliminate duplicate content across data objects. Consequently, ForkBase is not only efficient in performance, but also in space requirement. We demonstrate the performance of ForkBase using three applications: a blockchain platform, a wiki engine and a collaborative analytics application. We conduct extensive experimental evaluation of these applications against respective state-of-the-art system. The results show that ForkBase achieves superior performance while significantly lowering the development cost.

研究动机与目标

为解决现代应用（如区块链和协作分析）中对原生支持数据版本控制、分叉语义和防篡改特性的存储系统日益增长的需求。
通过将核心区块链和可分叉语义下放至存储层，降低应用开发复杂性。
与应用层或键值存储实现相比，提升性能和存储效率。
通过新型索引结构和基于内容的分块机制，实现高效的历史查询和增量更新。

提出的方法

ForkBase 使用版本号唯一标识数据内容及其历史记录，支持快速完整性验证和版本检索。
它将大对象拆分为数据块，并利用一种新型 POS-Tree 进行组织，结合基于内容的分块、Merkle 树哈希和 B+ 树索引，实现高效的去重和查找。
POS-Tree 支持分叉过程中的写时复制，实现无需冗余数据副本的隐式和显式分叉。
ForkBase 提供简单、高层级的 API，并支持多种物理存储布局（行存储和列存储），以优化查询性能和存储效率。
通过基于内容的哈希和索引对数据块进行处理，实现跨版本和分叉的细粒度去重，显著降低存储开销。

实验结果

研究问题

RQ1存储引擎是否能够原生支持数据版本控制、分叉语义和防篡改特性，以减少应用层复杂性并提升性能？
RQ2基于内容的版本化索引结构（POS-Tree）在版本化和分叉数据工作负载中，如何提升去重效率和查询效率？
RQ3统一存储引擎在区块链和协作分析工作负载中，与专用应用或基于键值存储的实现相比，性能优势有多大？
RQ4在版本化存储系统中，物理存储布局选择（行存储 vs. 列存储）对分析查询性能有何影响？

主要发现

由于客户端缓存了频繁访问的数据块，ForkBase 在读取历史数据版本时性能优于 Redis，尤其在需访问多个版本时优势更明显。
通过两层分区（2LP），ForkBase 实现了节点间存储分布的均衡，克服了单层分区（1LP）在维基工作负载中出现的负载不均问题。
在数据集修改工作负载中，得益于细粒度的块级去重机制，ForkBase 的存储消耗相比 OrpheusDB 最多降低 3 倍。
在分析查询中，列存储布局的 ForkBase 性能相比行存储布局的 ForkBase 和 OrpheusDB 提升了 10 倍，证明了布局感知优化的优势。
由于采用延迟加载和检查时/提交时的最小数据提交机制，ForkBase 的更新延迟相比 OrpheusDB 降低了两个数量级。
ForkBase 的版本比较开销因 POS-Tree 遍历而能高效扩展，尤其在差异较小时表现优异；而 OrpheusDB 因需进行全向量比较，开销恒定且较高。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。