TR94-07
Deterministic Part-of-Speech Tagging with Finite State Transducers
-
- "Deterministic Part-of-Speech Tagging with Finite State Transducers", Tech. Rep. TR94-07, Mitsubishi Electric Research Laboratories, Cambridge, MA, May 1994.BibTeX TR94-07 PDF
- @techreport{MERL_TR94-07,
- author = {Emmanuel Roche, Yves Schabes},
- title = {Deterministic Part-of-Speech Tagging with Finite State Transducers},
- institution = {MERL - Mitsubishi Electric Research Laboratories},
- address = {Cambridge, MA 02139},
- number = {TR94-07},
- month = may,
- year = 1994,
- url = {https://www.merl.com/publications/TR94-07/}
- }
,
- "Deterministic Part-of-Speech Tagging with Finite State Transducers", Tech. Rep. TR94-07, Mitsubishi Electric Research Laboratories, Cambridge, MA, May 1994.
Abstract:
Stochastic approaches to natural language processing have often been preferred to rule-based approaches because of their robustness and their automatic training capabilities. This was the case for part-of-speech tagging until Brill showed how state of the art part-of-speech tagging can be achieved by inferring a rule-based part-of-speech tagger from a training corpus. However current implementations of Brill\'s tagger run more slowly than previous approaches. In this paper, we present a finite-state tagger inspired by Brill\'s work which operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to deterministically follow a single path in a deterministic finite state machine. This result is achieved by encoding the application of the rules found in Brill\'s tagger as a non-deterministic finite state transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a part-of-speech tagger whose speed is dominated by the access time of mass storage devices.