QUICK REVIEW

[论文解读] Another one flew over the cuckoo's nest

Ely Porat, Bar Shalem|arXiv (Cornell University)|Apr 28, 2011

Advanced Image and Video Retrieval Techniques被引用 1

一句话总结

本文提出了一种非连续的Cuckoo哈希变体，将桶完全存储在单个内存页内，以利用更快的页级访问，从而提高内存利用率并减少插入迭代次数。通过在固定大小的页内允许重叠的非连续桶，该方法实现了97.46%的内存利用率，并在92%利用率下将插入迭代次数从545次减少到52次，优于先前的变体。

ABSTRACT

Cuckoo hashing [4] is a multiple choice hashing scheme in which each item can be placed in multiple locations, and collisions are resolved by moving items to their alternative locations. In the classical implementation of two-way cuckoo hashing, the memory is partitioned into contiguous disjoint fixed-size buckets. Each item is hashed to two buckets, and may be stored in any of the positions within those buckets. Ref. [2] analyzed a variation in which the buckets are contiguous and overlap. However, many systems retrieve data from secondary storage in same-size blocks called pages. Fetching a page is a relatively expensive process; but once a page is fetched, its contents can be accessed orders of magnitude faster. We utilize this property of memory retrieval, presenting a variant of cuckoo hashing incorporating the following constraint: each bucket must be fully contained in a single page, but buckets are not necessarily contiguous. Empirical results show that this modification increases memory utilization and decreases the number of iterations required to insert an item. If each item is hashed to two buckets of capacity two, the page size is 8, and each bucket is fully contained in a single page, the memory utilization equals 89.71% in the classical contiguous disjoint bucket variant, 93.78% in the contiguous overlapping bucket variant, and increases to 97.46% in our new non-contiguous bucket variant. When the memory utilization is 92% and we use breadth first search to look for a vacant position, the number of iterations required to insert a new item is dramatically reduced from 545 in the contiguous overlapping buckets variant to 52 in our new non-contiguous bucket variant. In addition to the empirical results, we present a theoretical lower bound on the memory utilization of our variation as a function of the page size.

研究动机与目标

为解决传统Cuckoo哈希在以固定大小页访问数据的系统中效率低下的问题，通过将桶的放置与页边界对齐。
通过在同一页内允许非连续且重叠的桶，提高Cuckoo哈希中的内存利用率。
通过最小化插入过程中寻找空槽所需迭代次数，减少插入时间。
理论上推导出所提出变体的内存利用率下限，作为页大小的函数。

提出的方法

该方法将内存划分为固定大小的页，并确保每个桶完全包含在单个页内，无论其空间是否连续。
只要每个桶完全位于一个页内，允许桶在页之间重叠且非连续。
项目被哈希到两个桶，插入通过探测替代位置进行，使用广度优先搜索解决驱逐链。
该系统利用一旦一页被加载，其内容访问速度显著更快的事实，从而降低延迟。
基于页大小和桶容量，推导出内存利用率的理论下限。

实验结果

研究问题

RQ1在页内非连续桶放置如何影响Cuckoo哈希中的内存利用率？
RQ2与连续和重叠桶变体相比，该新变体在减少插入迭代次数方面达到何种程度？
RQ3该页对齐Cuckoo哈希变体的内存利用率理论极限是什么？
RQ4页级内存访问效率在所提方案中的性能影响如何？

主要发现

所提出的非连续Cuckoo哈希变体在每个项目被哈希到容量为2的两个桶时，实现了97.46%的内存利用率，优于经典连续不相交变体的89.71%和重叠变体的93.78%。
在92%的内存利用率下，插入所需的迭代次数从重叠变体的545次减少到所提非连续变体的52次。
该方法通过将桶放置与页边界对齐，提高了内存效率，减少了页获取次数并改善了缓存局部性。
实验结果证实，非连续桶方法显著降低了与先前Cuckoo哈希变体相比的插入延迟。
建立了作为页大小函数的内存利用率理论下限，为未来设计提供了性能基准。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。