Skip to main content
QUICK REVIEW

[論文レビュー] Filtered Approximate Nearest Neighbor Search Cost Estimation

Wenxuan Xia, Mingyu Yang|arXiv (Cornell University)|Feb 6, 2026
Advanced Image and Video Retrieval Techniques被引用数 0
ひとこと要約

Introduces E2E, a cost-estimation framework that uses an early probing phase with filter-aware features to enable adaptive termination for filtered AKNN, achieving substantial latency reductions while preserving recall.

ABSTRACT

Hybrid queries combining high-dimensional vector similarity with structured attribute filtering have garnered significant attention across both academia and industry. A critical instance of this paradigm is filtered Approximate k Nearest Neighbor (AKNN) search, where embeddings (e.g., image or text) are queried alongside constraints such as labels or numerical range. While essential for rich retrieval, optimizing these queries remains challenging due to the highly variable search cost induced by combined filters. In this paper, we propose a novel cost estimation framework, E2E, for filtered AKNN search and demonstrate its utility in downstream optimization tasks, specifically early termination. Unlike existing approaches, our model explicitly captures the correlation between the query vector distribution and attribute-value selectivity, yielding significantly higher estimation accuracy. By leveraging these estimates to refine search termination conditions, we achieve substantial performance gains. Experimental results on real-world datasets demonstrate that our approach improves retrieval efficiency by 2x-3x over state-of-the-art baselines while maintaining high search accuracy.

研究の動機と目的

  • Motivate efficient filtered AKNN search where items have attribute constraints.
  • Show that distance-based signals alone are insufficient for cost estimation under filters.
  • Propose E2E to incorporate attribute distribution signals into cost predictions.
  • Demonstrate adaptive termination that stops easy queries early and preserves recall on hard queries.
  • Provide reproducible evaluation and release code for practitioners.

提案手法

  • Propose a cost-estimation framework (E2E) tailored to filtered AKNN search.
  • Incorporate two filter-aware features from an early probing phase: observed valid ratio and prospective valid ratio.
  • Combine filter-aware features with distance-based features in a lightweight LightGBM model for per-query cost prediction.
  • Use the predicted cost to enable adaptive termination during the local expansion stage of a graph-based AKNN index.
  • Train the estimator via supervised learning on replayed query logs with k-NN-grounded cost labels.
  • Integrate E2E into existing graph-based indexes to achieve early stopping when estimated budget is reached.

実験結果

リサーチクエスチョン

  • RQ1How do local–global selectivity misalignments affect cost prediction for filtered AKNN search?
  • RQ2Can early probing signals capturing attribute distributions improve cost estimation for filtered AKNN?
  • RQ3Does adaptive termination guided by E2E maintain recall while reducing latency for filtered AKNN queries?
  • RQ4What is the practical latency improvement of E2E compared to state-of-the-art baselines on real datasets?

主な発見

  • E2E achieves 1.1×–3.7× speedup over strong baselines at 95% recall.
  • Filter-aware features are crucial when local selectivity misaligns with global selectivity.
  • A lightweight LightGBM model provides fast per-query cost predictions (~0.025 ms on average).
  • Adaptive termination guided by predicted cost reduces unnecessary expansions with negligible overhead.
  • Experiments on six real-world datasets show consistent improvements over existing adaptive methods.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。