QUICK REVIEW

[论文解读] Boosting Multi-Core Reachability Performance with Shared Hash Tables

Alfons Laarman, Jaco van de Pol|arXiv (Cornell University)|Apr 16, 2010

Software Testing and Debugging Techniques参考文献 15被引用 42

一句话总结

本文提出了一种无锁、缓存优化的共享哈希表，用于多核模型检测中的可达性分析，通过最小化虚假共享并利用CPU缓存层次结构，实现高效可扩展性。通过针对现代多核架构优化数据布局、探测序列和桶大小，该方法在性能上比SPIN快达四倍，比DiVinE快两倍。

ABSTRACT

This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.

研究动机与目标

解决多核模型检测中共享状态存储的可扩展性限制。
通过用可扩展的并发数据结构替代静态分区，提升显式状态可达性分析的性能。
设计一种针对模型检测算法的内存层次结构和松散操作语义量身定制的哈希表。
在共享内存模型检测器中支持灵活的负载均衡和多种探索策略（例如伪深度优先搜索）。
为SPIN和DiVinE等工具中现有的共享存储解决方案提供高性能、无锁的替代方案。

提出的方法

设计一种无锁哈希表，针对模型检测中典型的弱一致性要求，实现低延迟、高吞吐量操作。
采用缓存行感知的数据布局和桶组织方式，减少虚假共享并提升空间局部性。
使用自定义探测序列，最小化缓存未命中，并与CPU预取行为保持一致。
通过广泛的性能分析，调优关键参数，如桶大小、缓存行大小和数据结构布局。
使用POSIX共享内存实现哈希表，以在多核环境中支持线程间状态共享。
将哈希表集成到LTSmin模型检测器中，以在多种模型类型上评估性能。

实验结果

研究问题

RQ1无锁、共享哈希表在多核可达性分析中是否能超越静态分区？
RQ2低级别内存布局和缓存行为如何影响模型检测中并发状态存储的性能？
RQ3通过自定义设计的哈希表在多大程度上能超越当前最先进的工具（如SPIN和DiVinE）？
RQ4与线程本地或分区存储相比，缓存优化的无锁设计是否能实现更好的负载均衡和算法灵活性？
RQ5在模型检测工作负载中，并发哈希表的关键性能参数是什么？

主要发现

所提出的共享哈希表在可达性性能上比DiVinE快两倍，比SPIN快四倍。
通过最小化虚假共享并优化缓存行使用，该实现在现代多核CPU上实现了卓越的可扩展性。
小规模模型因无指针、无内存分配设计导致缓存未命中率升高而性能下降，但可通过有控制地使用指针缓解此问题。
该设计支持灵活的负载均衡，包括动态和显式负载均衡，与静态均衡相比仅增加百分之几的开销。
通过性能分析和参数调优，证实该方案可超越当前最先进的CPU性能。
研究表明，架构感知（尤其是缓存层次结构感知）对于并行模型检测中的高性能至关重要。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。