QUICK REVIEW

[Paper Review] MBT: A Memory-Based Part of Speech Tagger-Generator

Walter Daelemans, Jakub Zavrel|ArXiv.org|Jul 11, 1996

Natural Language Processing Techniques30 references259 citations

TL;DR

This paper presents MBT, a memory-based part-of-speech tagger-generator that uses similarity-based reasoning on a case base of word-context-tag triples to assign tags. By leveraging IGTree for efficient indexing and dynamic context size selection, MBT achieves high accuracy comparable to statistical methods with fast learning and tagging, small training data, incremental updates, and explanation capabilities.

ABSTRACT

We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using {\em IGTree}, a tree-based formalism for indexing and searching huge case bases.} The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.

Motivation & Objective

To develop a scalable, accurate, and efficient part-of-speech tagging system that reduces development time compared to rule-based or statistical approaches.
To address the computational inefficiency of traditional k-nearest neighbors in large case bases by introducing a compressed indexing structure (IGTree).
To enable incremental learning and explanation capabilities in a tagging system without requiring extensive feature engineering or smoothing.
To achieve robust performance on unknown words without morphological analysis, leveraging contextual and surface form features.
To demonstrate that memory-based learning can be a viable alternative to HMMs and n-gram models in large-scale NLP applications.

Proposed method

The system stores training examples as feature-value patterns (word, context, tag) in a case base, representing each as a vector of symbolic features.
Tagging is performed via k-nearest neighbor (k-nn) classification: for each word in context, the most similar cases in memory are retrieved using a similarity metric.
The similarity metric uses a symbolic overlap function (δ(xi,yi) = 0 if xi=yi, else 1) to compute distance between feature vectors.
IGTree, a tree-based indexing formalism, is used to compress and efficiently search the case base, enabling fast lookup independent of case base size.
The system dynamically determines optimal context size for disambiguation by analyzing the IGTree structure during training.
Feature weighting is applied to flexibly integrate multiple information sources (e.g., word form and context) during similarity computation.

Experimental results

Research questions

RQ1Can a memory-based approach achieve tagging accuracy comparable to established statistical models like HMMs or n-gram taggers?
RQ2Can IGTree indexing make memory-based tagging computationally feasible for large-scale corpora?
RQ3Does the system provide robust performance on unknown words without requiring morphological analysis?
RQ4Can the system support incremental learning and explanation of decisions without retraining?
RQ5Is the automatic selection of optimal context size for disambiguation possible within a non-parametric learning framework?

Key findings

MBT achieves tagging accuracy on par with known statistical approaches, demonstrating the feasibility of memory-based learning for large-scale POS tagging.
With only 300–400 K tagged words, the system achieves good performance, indicating that small training corpora are sufficient for effective learning.
Tagging speed reaches approximately 1000 words per second, showing that the IGTree-based indexing enables fast inference despite large case bases.
The system provides explanation capabilities by retrieving nearest neighbors and IGTree paths, enabling traceable decision-making.
Over 90% of unknown words in the WSJ corpus are correctly tagged using context and word form, without morphological analysis.
The IGTree formalism enables automatic, non-parametric estimation of classifications, avoiding issues with smoothing and convergence found in other methods.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.