Skip to main content
QUICK REVIEW

[论文解读] Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation

William Kuszmaul|arXiv (Cornell University)|Jan 1, 2019
Time Series Analysis and Forecasting参考文献 42被引用 1
一句话总结

本文提出了动态时间规整(DTW)在低距离范围内的首个强次二次算法,在任意度量空间中,当最小非零距离为1时,可在 O(n · dtw(x, y)) 时间内计算 DTW。此外,该文还提出了一个 O(n²⁻ᵝ)-时间的近似算法,其近似因子为 O(nᵝ),适用于树度量空间下的 DTW,并通过从编辑距离问题的归约,建立了 DTW 和 LCS 的新条件下界。

ABSTRACT

Dynamic time warping distance (DTW) is a widely used distance measure between time series, with applications in areas such as speech recognition and bioinformatics. The best known algorithms for computing DTW run in near quadratic time, and conditional lower bounds prohibit the existence of significantly faster algorithms. The lower bounds do not prevent a faster algorithm for the important special case in which the DTW is small, however. For an arbitrary metric space Sigma with distances normalized so that the smallest non-zero distance is one, we present an algorithm which computes dtw(x, y) for two strings x and y over Sigma in time O(n * dtw(x, y)). When dtw(x, y) is small, this represents a significant speedup over the standard quadratic-time algorithm. Using our low-distance regime algorithm as a building block, we also present an approximation algorithm which computes dtw(x, y) within a factor of O(n^epsilon) in time O~(n^{2 - epsilon}) for 0 < epsilon < 1. The algorithm allows for the strings x and y to be taken over an arbitrary well-separated tree metric with logarithmic depth and at most exponential aspect ratio. Notably, any polynomial-size metric space can be efficiently embedded into such a tree metric with logarithmic expected distortion. Extending our techniques further, we also obtain the first approximation algorithm for edit distance to work with characters taken from an arbitrary metric space, providing an n^epsilon-approximation in time O~(n^{2 - epsilon}), with high probability. Finally, we turn our attention to the relationship between edit distance and dynamic time warping distance. We prove a reduction from computing edit distance over an arbitrary metric space to computing DTW over the same metric space, except with an added null character (whose distance to a letter l is defined to be the edit-distance insertion cost of l). Applying our reduction to a conditional lower bound of Bringmann and Künnemann pertaining to edit distance over {0, 1}, we obtain a conditional lower bound for computing DTW over a three letter alphabet (with distances of zero and one). This improves on a previous result of Abboud, Backurs, and Williams, who gave a conditional lower bound for DTW over an alphabet of size five. With a similar approach, we also prove a reduction from computing edit distance (over generalized Hamming Space) to computing longest-common-subsequence length (LCS) over an alphabet with an added null character. Surprisingly, this means that one can recover conditional lower bounds for LCS directly from those for edit distance, which was not previously thought to be the case.

研究动机与目标

  • 在真实距离较小时,开发一种比 O(n²) 更快的 DTW 算法,以突破一般情况下的近似二次下界。
  • 设计一种在强次二次时间内运行且具有可证明近似保证的 DTW 近似算法。
  • 将技术扩展至任意度量空间上的编辑距离近似,并为 DTW 和 LCS 推导出新的条件下的下界。
  • 建立从编辑距离到带空字符的 DTW 的归约,从而实现条件下的下界转移。

提出的方法

  • 设计一种新颖的非对称动态规划公式,将一个字符串视为单个字符,另一个字符串视为相同字符的连续段,将子问题数量限制在 O(nK),其中 K 为距离阈值。
  • 通过在字符串之间交替切换角色,对子问题进行递归分解,以限制低代价子问题的数量。
  • 对随机缩放(r ∈ [R, 2R])应用以减少随机样本中的编辑距离,利用马尔可夫不等式获得概率保证。
  • 构建一种基于间隙的编辑距离算法,通过在缩放参数上进行二分查找并结合随机采样,以区分小距离与大距离。
  • 将任意度量空间嵌入到深度为对数级且期望畸变较低的分离良好树度量中,以实现高效的近似。
  • 将度量空间上的编辑距离归约为带空字符的 DTW(距离 = 插入代价),从而将编辑距离的条件下的下界转移至 DTW。

实验结果

研究问题

  • RQ1当真实距离较小时,能否在低于 O(n²) 的时间内计算 DTW?
  • RQ2是否存在一种在一般度量空间上运行于强次二次时间的 DTW 近似算法?
  • RQ3能否通过归约将编辑距离的条件下的下界转移至 DTW?
  • RQ4是否可使用相同的归约技术,从编辑距离推导出 LCS 的下界?
  • RQ5随机缩放对采样字符串中编辑距离的影响如何?

主要发现

  • 本文提出了一种 O(n · dtw(x, y))-时间算法,用于低距离范围内的 DTW,当 dtw 值较小时,显著优于 O(n²)。
  • 针对深度为对数级且最多指数级范围比的树度量空间,开发了一种 O(n²⁻ᵝ)-时间 O(nᵝ)-近似算法。
  • 任何多项式规模的度量空间均可被嵌入此类树度量中,且期望畸变为对数级,从而使得该近似算法可普遍适用。
  • 本文通过从 {0,1} 上的编辑距离问题归约,证明了在三字母字母表上 DTW 的条件下的下界,优于此前针对更大字母表的已有结果。
  • 通过将广义汉明空间上的编辑距离归约为带空字符的 LCS,实现了从编辑距离到 LCS 的条件下的下界转移。
  • 随机样本 sr(x) 与 sr(y) 之间的期望编辑距离至多为原始编辑距离的 5 倍,从而在近似算法中实现概率性间隙检测。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。