[Paper Review] A Linear Observed Time Statistical Parser Based on Maximum Entropy Models
This paper presents a linear-time statistical parser based on maximum entropy models that achieves 87% precision and 86% recall on the Wall Street Journal corpus, surpassing prior results. It uses a three-pass, shift-reduce style parsing architecture with feature-based action scoring and demonstrates that reranking the top 20 parses can boost accuracy to 93%.
This paper presents a statistical parser for natural language that obtains a parsing accuracy---roughly 87% precision and 86% recall---which surpasses the best previously published results on the Wall St. Journal domain. The parser itself requires very little human intervention, since the information it uses to make parsing decisions is specified in a concise and simple manner, and is combined in a fully automatic way under the maximum entropy framework. The observed running time of the parser on a test sentence is linear with respect to the sentence length. Furthermore, the parser returns several scored parses for a sentence, and this paper shows that a scheme to pick the best parse from the 20 highest scoring parses could yield a dramatically higher accuracy of 93% precision and recall.
Motivation & Objective
- To develop a statistical parser that achieves higher parsing accuracy than previously published methods on the Wall Street Journal corpus.
- To minimize human linguistic intervention by using a concise, automatically learned feature set within a maximum entropy framework.
- To ensure efficient parsing by achieving linear observed running time with respect to sentence length.
- To explore the potential of reranking top-k parses to significantly improve parsing accuracy beyond single-parse selection.
- To compare the proposed maximum entropy parser with existing models like the bigram parser and SPATTER, emphasizing differences in modeling, feature integration, and computational efficiency.
Proposed method
- The parser uses a three-pass, left-to-right procedure: POS tagging, chunking, and constituent building, each guided by action selection.
- Each parsing action (e.g., Start NP, Join VP, Check) is scored using maximum entropy models that compute probabilities based on syntactic features of the current context.
- Features are defined simply using words and POS tags, with relative importance automatically learned from a training corpus like the Penn Treebank.
- A top-K best-first search heuristic returns multiple scored parses, enabling reranking strategies to improve final accuracy.
- The maximum entropy framework allows robust integration of diverse features, including punctuation and syntactic patterns, without prior feature screening.
- The parser's observed running time is linear in sentence length due to the efficient, incremental construction of parse trees and the use of a simple search strategy.
Experimental results
Research questions
- RQ1Can a maximum entropy-based parser achieve higher parsing accuracy than existing statistical parsers on the Wall Street Journal corpus?
- RQ2To what extent does the linear observed time complexity of the parser impact its scalability and practical deployment?
- RQ3How effective is reranking the top 20 highest-scoring parses in improving parsing accuracy compared to selecting only the single best parse?
- RQ4Can a minimal, linguistically lightweight feature set—defined using only words and POS tags—achieve competitive performance under the maximum entropy framework?
- RQ5How does the proposed parser compare in performance and efficiency to the bigram parser and SPATTER parser in terms of accuracy, feature usage, and computational cost?
Key findings
- The maximum entropy parser achieves 87.5% precision and 86.3% recall on section 23 of the WSJ Treebank, outperforming the best previously published results.
- Reranking the top 20 highest-scoring parses increases accuracy to 93% precision and recall, demonstrating a dramatic improvement over single-parse selection.
- The parser's observed running time is linear with respect to sentence length, making it efficient for long input sequences.
- The method requires minimal linguistic effort for feature engineering, as features are defined simply and their weights are learned automatically via maximum entropy training.
- The maximum entropy framework enables robust integration of diverse features, including punctuation, without requiring hand-crafted rules or preprocessing.
- The parser outperforms both the bigram parser and SPATTER parser in accuracy, while using a simpler, more general modeling approach that avoids expensive clustering or task-specific estimation schemes.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.