QUICK REVIEW

[论文解读] Boosting XML Filtering with a Scalable FPGA-based Architecture

Abhishek Mitra, Marcos R. Vieira|ArXiv.org|Sep 9, 2009

Advanced Database Systems and Queries参考文献 21被引用 39

一句话总结

该论文提出了一种纯硬件的FPGA架构，通过将XPath查询转换为正则表达式，并在FPGA上实现这些表达式，利用片上栈实现可扩展的、高吞吐量的路径查询处理，从而加速了发布-订阅系统中的XML过滤。该方法通过将解析和过滤逻辑集中于单个FPGA设备上，消除了处理器间通信开销，在吞吐量方面相比传统软件或混合系统实现了超过十倍的提升。

ABSTRACT

The growing amount of XML encoded data exchanged over the Internet increases the importance of XML based publish-subscribe (pub-sub) and content based routing systems. The input in such systems typically consists of a stream of XML documents and a set of user subscriptions expressed as XML queries. The pub-sub system then filters the published documents and passes them to the subscribers. Pub-sub systems are characterized by very high input ratios, therefore the processing time is critical. In this paper we propose a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem. By utilizing the high throughput that an FPGA provides for parallel processing, our approach achieves drastically better throughput than the existing software or mixed (hardware/software) architectures. The XPath queries (subscriptions) are translated to regular expressions which are then mapped to FPGA devices. By introducing stacks within the FPGA we are able to express and process a wide range of path queries very efficiently, on a scalable environment. Moreover, the fact that the parser and the filter processing are performed on the same FPGA chip, eliminates expensive communication costs (that a multi-core system would need) thus enabling very fast and efficient pipelining. Our experimental evaluation reveals more than one order of magnitude improvement compared to traditional pub/sub systems.

研究动机与目标

解决由于数据量增加和复杂过滤工作负载导致的高吞吐量XML发布-订阅系统中的性能瓶颈。
通过将XML解析和过滤逻辑集中于单个FPGA设备上，消除通信延迟并提高吞吐量。
通过纯硬件解决方案实现可扩展的、高吞吐量的XML过滤，支持广泛的XPath查询。
实现显著高于现有软件或软硬件混合架构的处理速率。

提出的方法

将用户指定的XPath查询转换为等效的正则表达式，以实现高效的硬件映射。
使用可扩展的基于栈的架构在FPGA上实现正则表达式引擎，以处理分层路径表达式。
将XML解析器和过滤逻辑嵌入同一FPGA中，以实现低延迟、流水线化的处理。
利用并行处理和硬件级优化，实现对流式XML文档的高吞吐量处理。
设计模块化、可扩展的FPGA架构，支持动态查询更新和高效的资源利用。
通过将所有处理阶段——解析、匹配和过滤——保留在单个FPGA设备内，最大限度减少数据移动。

实验结果

研究问题

RQ1纯硬件FPGA架构是否能在发布-订阅系统的XML过滤中实现显著高于软件或混合系统的吞吐量？
RQ2如何有效将XPath查询转换为适合高效FPGA实现的正则表达式？
RQ3哪些架构技术能够实现在FPGA上对分层XML路径查询的可扩展和高吞吐量处理？
RQ4将解析和过滤逻辑集中于单个FPGA上在多大程度上减少了延迟和通信开销？
RQ5采用这种原生FPGA方法在吞吐量和可扩展性方面可实现多大的性能提升？

主要发现

所提出的FPGA架构在吞吐量方面相比传统软件驱动的发布-订阅系统实现了超过十倍的提升。
将解析和过滤功能集成在同一FPGA芯片上，消除了昂贵的处理器间通信，实现了高效的流水线处理和低延迟运行。
使用片上栈能够高效表达和处理广泛的路径查询，支持复杂的XPath模式。
将XPath映射为正则表达式实现了高吞吐量、可扩展的处理，适用于实时XML流工作负载。
该系统表现出强大的可扩展性，适用于高吞吐量、低延迟的内容路由环境部署。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。