QUICK REVIEW

[Paper Review] Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels

Ted Briscoe, John A. Carroll|arXiv (Cornell University)|Sep 20, 1995

Natural Language Processing Techniques17 references26 citations

TL;DR

This paper presents a probabilistic LR parser that uses part-of-speech and punctuation labels to perform robust, domain-independent syntactic parsing. By integrating a unification-based grammar with probability estimates from bracketed training data, it demonstrates that punctuation significantly improves parsing accuracy, with experiments showing measurable gains when punctuation is included in the input.

ABSTRACT

We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.

Motivation & Objective

To develop a robust, domain-independent syntactic parser capable of handling unrestricted natural language input.
To investigate the contribution of punctuation to syntactic parsing accuracy by comparing results with and without punctuation.
To evaluate the effectiveness of a unification-based grammar combined with probabilistic LR parsing on part-of-speech and punctuation sequences.
To assess parsing performance using probabilities derived from bracketed training data.

Proposed method

The parser operates on sequences of part-of-speech and punctuation labels rather than raw text.
A unification-based grammar is used to represent syntactic constraints and relationships.
Probabilistic LR parsing is applied, with probabilities estimated from bracketed training corpora.
The system parses identical texts both with and without punctuation to isolate punctuation's impact.
Coverage of multiple corpora is reported to demonstrate robustness and generalization.

Experimental results

Research questions

RQ1To what extent does including punctuation improve syntactic parsing accuracy in a domain-independent setting?
RQ2How well does a probabilistic LR parser perform when parsing sequences of POS and punctuation labels?
RQ3What is the contribution of punctuation to syntactic analysis compared to POS tags alone?
RQ4How effective is a unification-based grammar in conjunction with probabilistic LR parsing for syntactic parsing?

Key findings

Punctuation significantly improves parsing accuracy, as shown by measurable performance gains when punctuation is included.
The parser achieves robust coverage across multiple corpora using only POS and punctuation labels.
The integration of a unification-based grammar with probabilistic LR parsing enables accurate syntactic analysis without full lexical input.
This is the first substantial study to empirically assess punctuation's role in syntactic parsing using controlled experiments with and without punctuation.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.