QUICK REVIEW

[论文解读] Timehash: Hierarchical Time Indexing for Efficient Business Hours Search

Jinoh Kim, Jaewon Son|arXiv (Cornell University)|Mar 3, 2026

Time Series Analysis and Forecasting被引用 0

一句话总结

Timehash 引入一种分层的多分辨率时间索引方法，在保持分钟级精度和工作时间段检索的 100% 召回/精确度的同时，显著降低索引大小。

ABSTRACT

Temporal range filtering is a critical operation in large-scale search systems, particularly for location-based services that need to filter businesses by operating hours. Traditional approaches either suffer from poor query performance (scope filtering) or index size explosion (minute-level indexing). We present Timehash, a novel hierarchical time indexing algorithm that achieves over 99% reduction in index size compared to minute-level indexing while maintaining 100% precision. Timehash employs a flexible multi-resolution strategy with customizable hierarchical levels. Through empirical analysis on distributions from 12.6 million business records of a production location search service, we demonstrate a data-driven methodology for selecting optimal hierarchies tailored to specific data distributions. We evaluated Timehash on up to 12.6 million synthetic POIs generated from production distributions. Experimental results show that a five-level hierarchy reduces index terms to 5.6 per document (99.1% reduction versus minute-level indexing), with zero false positives and zero false negatives. Scalability benchmarks confirm constant per-document cost from 100K to 12.6M POIs, while supporting complex scenarios such as break times and irregular schedules. Our approach is generalizable to various temporal filtering problems in search systems, e-commerce, and reservation platforms.

研究动机与目标

正式化大规模倒排索引中工作时间段检索的时间范围索引问题。
提出 Timehash，一种分层多分辨率算法，将时间范围分解为可扩展、易读的时间桶。
展示基于数据的方法，以选择针对实际数据分布的最优层级。
证明理论上的空间与正确性保证，并在生产规模数据上验证可扩展性。

提出的方法

定义一组分层时间粒度（如 4 小时、1 小时、15 分钟、5 分钟、1 分钟），在每一步用最大的匹配区块贪心覆盖时间范围。
生成表示时间范围的多个 Timehash 键，保持分钟级精度。
在所有层级生成查询键，确保点查询的完整召回并支持区间查询。
证明空间复杂度为 O(T/m1)，并对键的数量给出一个小的常数界限（在所测试的层级中最多 28 个）。
提供包含 getIndexTerms(from, to) 和 getQueryTerms(hhmm) 的 C++ 库实现。
在多达 1260 万个合成/生产 POI 上验证该方法，展示 99.1% 的索引大小降低以及零假阳性/零假阴性。

实验结果

研究问题

RQ1如何在保持分钟级精度的同时，通过较小的索引大小实现工作时间段的时序范围过滤？
RQ2是否可以通过分层多分辨率编码，在不牺牲正确性的前提下减少每个文档的索引项数量？
RQ3Timehash 的理论空间和查询复杂度是多少，在真实生产数据上表现如何？
RQ4如何为给定的时间分布选择最优层级，以在大小和精度之间实现平衡？
RQ5Timehash 能否处理如休息时间、非规则日程及跨夜范围等复杂模式，同时保持高效？

主要发现

五层次结构（4 小时、1 小时、15 分钟、5 分钟、1 分钟）在索引大小与精度之间实现最佳权衡。
Timehash 将每个文档的索引项从分钟级索引减少了 99.1%（示例中为 5 个键），且无假阳性/无假阴性。
空间复杂度为 O(T/m1)，相对于天真分钟级索引有常数倍的降低，最坏情况的键数量经验上限定为 28（对比 1,440 分钟）。
该方法支持休息时间、非规则日程和 24 小时运营，并从 10 万条线性扩展到 1260 万 POI 时呈线性扩展。
生产部署覆盖了 18 个月，索引了 1260 万 POI，具备适合生产搜索工作负载的时效性和可靠性。
代码和产出物可在提供的 GitHub 仓库获取，以便重复性研究。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。