[Paper Review] Distributional Measures as Proxies for Semantic Relatedness
This paper presents a comprehensive analysis of distributional measures for semantic relatedness, evaluating their strengths and limitations in mimicking human judgments. It introduces new measures—such as Saif's Div and KLD-based metrics—that improve alignment with human notions of relatedness by addressing asymmetry, frequency bias, and context weighting, ultimately offering more robust alternatives to traditional methods like PMI and cosine similarity.
The automatic ranking of word pairs as per their semantic relatedness and ability to mimic human notions of semantic relatedness has widespread applications. Measures that rely on raw data (distributional measures) and those that use knowledge-rich ontologies both exist. Although extensive studies have been performed to compare ontological measures with human judgment, the distributional measures have primarily been evaluated by indirect means. This paper is a detailed study of some of the major distributional measures; it lists their respective merits and limitations. New measures that overcome these drawbacks, that are more in line with the human notions of semantic relatedness, are suggested. The paper concludes with an exhaustive comparison of the distributional and ontology-based measures. Along the way, significant research problems are identified. Work on these problems may lead to a better understanding of how semantic relatedness is to be measured.
Motivation & Objective
- To systematically evaluate existing distributional measures of semantic relatedness and identify their limitations in aligning with human judgments.
- To propose new distributional measures that address key drawbacks such as asymmetry, frequency bias, and poor handling of rare co-occurrences.
- To compare distributional measures with ontology-based approaches (e.g., WordNet) and highlight their relative merits and shortcomings.
- To identify open research problems in measuring semantic relatedness that could lead to better models of human-like semantic understanding.
- To provide a unified framework for evaluating and improving distributional similarity measures using probabilistic and information-theoretic principles.
Proposed method
- Uses co-occurrence contexts from large corpora to define word contexts, with window sizes varying from sentence-level to document-level.
- Applies information-theoretic measures such as Pointwise Mutual Information (PMI), Kullback-Leibler Divergence (KLD), and Jensen-Shannon Divergence (JSD) to quantify distributional similarity.
- Introduces new compositional measures (e.g., Saif^Div_AvgWt, Saif^Div_MaxWt) that weight context words by their maximum or average probability across target words.
- Proposes asymmetric and symmetric variants of KLD and PMI-based measures to better reflect directional and mutual relatedness.
- Employs normalized and weighted forms of cosine, Jaccard, and Dice similarity to compare distributional profiles across word pairs.
- Combines multiple measures into hybrid models (e.g., CRMs) that integrate type- and token-based associations using F1-like and weighted averaging strategies.
Experimental results
Research questions
- RQ1How do different distributional measures perform in replicating human judgments of semantic relatedness?
- RQ2What are the key limitations of existing distributional measures such as PMI, cosine similarity, and KLD in capturing human-like semantic relatedness?
- RQ3Can new distributional measures be designed to better handle asymmetry, frequency bias, and context weighting while improving correlation with human judgments?
- RQ4How do distributional measures compare in performance and robustness to ontology-based measures like those derived from WordNet?
- RQ5What are the most promising directions for future research in measuring semantic relatedness using distributional models?
Key findings
- Traditional distributional measures like PMI and cosine similarity suffer from high sensitivity to low-frequency co-occurrences, leading to inflated scores for rare but non-representative word pairs.
- Asymmetric measures such as KLD and its variants (e.g., KLD_Avg, KLD_Max) outperform symmetric counterparts in capturing directional semantic relationships, especially when one word has a richer or more specific context.
- The proposed Saif^Div_AvgWt and Saif^Div_MaxWt measures achieve higher correlation with human judgments by weighting context words based on their relative importance in the joint context of two target words.
- Hybrid models combining PMI and KLD-based components (e.g., CRMs) show improved robustness and performance, particularly when balancing precision and recall in word association detection.
- The study identifies that many existing measures fail to account for context overlap and distributional divergence in a balanced way, and that compositional, context-weighted measures significantly outperform non-compositional ones.
- Among the evaluated measures, KLD-based and PMI-based compositional models (e.g., KLD_Avg, Saif^Div_AvgWt) demonstrate the strongest alignment with human judgments, especially on benchmark word pairs like 'honey–bee' versus 'paper–car'.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.