[Paper Review] Metric distances derived from cosine similarity and Pearson and Spearman correlations
This paper derives metric distances from cosine similarity, Pearson, and Spearman correlations using metric-preserving functions, particularly concave and increasing transformations. It identifies two classes: one that maximizes distance between anti-correlated pairs (e.g., angular and correlation distances), and another that groups correlated and anti-correlated pairs (e.g., acute angular and absolute correlation distances), both satisfying the triangle inequality.
We investigate two classes of transformations of cosine similarity and Pearson and Spearman correlations into metric distances, utilising the simple tool of metric-preserving functions. The first class puts anti-correlated objects maximally far apart. Previously known transforms fall within this class. The second class collates correlated and anti-correlated objects. An example of such a transformation that yields a metric distance is the sine function when applied to centered data.
Motivation & Objective
- To derive metric distances from cosine similarity, Pearson correlation, and Spearman correlation that satisfy the triangle inequality.
- To classify transformations of correlation and similarity measures into two distinct classes: one emphasizing anti-correlation, the other grouping correlated and anti-correlated pairs.
- To establish conditions under which functions of angular distances preserve metric properties using concave and increasing functions.
- To provide mathematically rigorous, metric-preserving transformations applicable to data analysis, clustering, and indexing algorithms.
Proposed method
- Uses the angular distance $ d_1(x,y) = \arccos(A(x,y)) $ as a base metric, where $ A $ is cosine similarity, Pearson, or Spearman correlation.
- Applies metric-preserving functions—specifically concave and increasing functions on $[0, \pi]$—to transform the angular distance into new metric distances.
- Derives the correlation distance $ d_2(x,y) = \sqrt{\frac{1}{2}(1 - A(x,y))} $, equivalent to $ \sin(\frac{1}{2}\theta) $, as a metric preserving the ordinal ranking of distances.
- Introduces the acute angular distance $ d_3(x,y) = \frac{1}{2}\pi - \left|\frac{1}{2}\pi - \theta\right| $ and absolute correlation distance $ d_4(x,y) = \sqrt{1 - A(x,y)^2} $, both forming a second class of metric distances.
- Proves that subadditivity of concave functions ensures the triangle inequality is preserved under transformation.
- Demonstrates that strictly convex functions (e.g., $ g(x) = 1 - \cos(x) $) violate the triangle inequality, thus not yielding valid metrics.
Experimental results
Research questions
- RQ1Which transformations of cosine similarity and correlation coefficients yield valid metric distances satisfying the triangle inequality?
- RQ2How can metric-preserving functions be used to derive new distance measures from existing correlation and similarity metrics?
- RQ3What distinguishes the two classes of metric distances: one that separates anti-correlated pairs, and another that groups correlated and anti-correlated pairs?
- RQ4Why do certain functions like $ 1 - \cos(\theta) $ fail to preserve the triangle inequality despite being derived from angular distance?
- RQ5To what extent are the derived distances ordinally equivalent, and how does this affect their use in data analysis?
Key findings
- The angular distance $ \arccos(A(x,y)) $ is a valid metric for any correlation or similarity measure $ A \in [-1,1] $.
- The correlation distance $ \sqrt{\frac{1}{2}(1 - A(x,y))} $ is a metric that places anti-correlated pairs at maximal distance.
- The acute angular distance $ \frac{1}{2}\pi - \left|\frac{1}{2}\pi - \theta\right| $ and absolute correlation distance $ \sqrt{1 - A(x,y)^2} $ form a second class of metric distances that treat correlated and anti-correlated pairs symmetrically.
- Functions that are strictly convex on $[0, \epsilon]$ and satisfy $ f(0) = 0 $ violate the triangle inequality, as shown by counterexample with $ g(x) = 1 - \cos(x) $.
- All derived distances are ordinally equivalent to the angular distance, preserving rankings of pairwise similarities.
- Compositions of concave functions, such as $ f_5(x) = \sin(x)^p $ for $ 0 < p \leq 1 $, also yield valid metric distances.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.