Skip to main content
QUICK REVIEW

[Paper Review] A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation

Ted Pedersen|ArXiv.org|May 7, 2000
Natural Language Processing Techniques16 references133 citations
TL;DR

This paper proposes a simple ensemble of Naive Bayesian classifiers using co-occurrence features from varying-sized left and right context windows to improve word sense disambiguation (WSD). By combining 81 classifiers trained on different window sizes and using a majority vote, the method achieves 88% accuracy on 'line' and 89% on 'interest', rivaling state-of-the-art results with minimal complexity.

ABSTRACT

This paper presents a corpus-based approach to word sense disambiguation that builds an ensemble of Naive Bayesian classifiers, each of which is based on lexical features that represent co--occurring words in varying sized windows of context. Despite the simplicity of this approach, empirical results disambiguating the widely studied nouns line and interest show that such an ensemble achieves accuracy rivaling the best previously published results.

Motivation & Objective

  • To improve word sense disambiguation accuracy using a simple, scalable ensemble method based on co-occurrence features.
  • To investigate whether combining multiple Naive Bayesian classifiers with different context window sizes enhances disambiguation performance.
  • To determine if shallow lexical features (co-occurrences) outperform more complex linguistic features in WSD.
  • To evaluate the effectiveness of majority voting over weighted voting in combining classifier outputs.
  • To explore the impact of window size diversity on ensemble performance and error complementarity.

Proposed method

  • Each classifier in the ensemble is trained on a distinct combination of left and right context window sizes (0 to 50 words), resulting in 81 unique classifiers.
  • Contextual features are binary indicators of word co-occurrence within a specified window, with no stemming, part-of-speech tagging, or capitalization/punctuation handling.
  • The Naive Bayesian model estimates class-conditional probabilities using frequency counts of feature-sense pairs, with Laplace smoothing applied to zero-frequency events.
  • The ensemble combines predictions via a simple majority vote across nine carefully selected classifiers, each from a different window size category.
  • Classifier selection prioritizes diversity in window size to maximize error complementarity and reduce redundancy.
  • A weighted voting strategy was tested but found to underperform compared to majority voting.

Experimental results

Research questions

  • RQ1Can an ensemble of simple Naive Bayesian classifiers trained on varying context window sizes outperform individual classifiers in word sense disambiguation?
  • RQ2Does using only co-occurrence features from lexical windows yield competitive accuracy compared to more complex linguistic features?
  • RQ3Is majority voting more effective than weighted voting for combining predictions in a WSD ensemble?
  • RQ4How does the diversity of window sizes among ensemble members affect overall disambiguation accuracy?
  • RQ5Can a simple, corpus-based approach with minimal feature engineering achieve state-of-the-art performance on standard WSD benchmarks?

Key findings

  • The ensemble achieved 88% accuracy on the word 'line' and 89% on 'interest', rivaling the best previously published results.
  • A majority vote of nine diverse classifiers (selected from different window size categories) outperformed both individual classifiers and broader ensembles.
  • Ensembles based on similar-sized windows (e.g., medium-medium) showed little improvement over individual classifiers, indicating redundancy.
  • A full ensemble of all 81 classifiers performed poorly (81% for 'interest'), highlighting the need for strategic classifier selection.
  • Weighted voting produced lower accuracy (83% for 'interest') than majority voting (89%), suggesting that simple voting is more effective in this setup.
  • Co-occurrence features alone were sufficient to achieve high accuracy, with no significant gain from adding part-of-speech or collocation features.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.