[论文解读] Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
Introduces Hierarchical NSW (HNSW), a fully graph-based approximate KNN index with multi-layer proximity graphs and scale-separated links, enabling fast, robust ANN search with logarithmic complexity.
We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. Hierarchical NSW incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements. The maximum layer in which an element is present is selected randomly with an exponentially decaying probability distribution. This allows producing graphs similar to the previously studied Navigable Small World (NSW) structures while additionally having the links separated by their characteristic distance scales. Starting search from the upper layer together with utilizing the scale separation boosts the performance compared to NSW and allows a logarithmic complexity scaling. Additional employment of a heuristic for selecting proximity graph neighbors significantly increases performance at high recall and in case of highly clustered data. Performance evaluation has demonstrated that the proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches. Similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.
研究动机与目标
- Develop a fully graph-based ANN index that avoids coarse-search structures used by other proximity-graph methods.
- Introduce a hierarchical, multi-layer graph where each element participates in a random top layer according to an exponential decay, creating scale-separated proximity links.
- Show that starting from upper layers and using scale-aware neighbor selection boosts performance and recall, especially on clustered data.
提出的方法
- Construct a multi-layer proximity-graph index where the maximum layer of an element is determined by an exponential decay distribution.
- Perform search by traversing from the top layer down to lower layers using greedy or heuristic navigation.
- Use a heuristic to select proximity graph neighbors to improve performance at high recall and for highly clustered data.
- Leverage the relationship to skip-list structures to enable scalable, potentially distributed implementations.
- Compare against NSW and other vector-only approaches to demonstrate performance gains on general metric spaces.
实验结果
研究问题
- RQ1How does hierarchical layering and scale-separated links affect ANN search speed and accuracy compared to non-hierarchical NSW methods?
- RQ2Can a fully graph-based index without auxiliary coarse structures achieve logarithmic query complexity in practice across diverse metric spaces?
- RQ3Does a heuristic neighbor selection improve recall, especially on highly clustered data?
- RQ4Is the approach amenable to balanced distributed implementations due to its skip-list-like structure?
主要发现
- The HNSW index achieves improved performance over NSW and vector-only methods by using hierarchical layers and scale-separated links.
- Search starting from higher layers combined with scale-aware navigation yields logarithmic-like complexity and faster queries.
- A neighbor-selection heuristic significantly boosts recall and performance on highly clustered data.
- The graph-based approach facilitates straightforward distributed implementations due to its skip-list-like properties.
- Empirical evaluations show strong performance gains over prior open-source state-of-the-art methods in general metric spaces.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。