A Machine learning approach to POS tagging

Màrquez Villodre, Lluís; Padró, Lluís; Rodríguez Hontoria, Horacio

Visualitza/Obre

R97-57.pdf (1,261Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Màrquez Villodre, Lluís

Padró, Lluís

Rodríguez Hontoria, Horacio

Tipus de documentReport de recerca

Data publicació1997-12

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation.

CitacióMarquez, L., Padro, L., Rodriguez, H. "A Machine learning approach to POS tagging". 1997.

Forma partLSI-97-57-R

URIhttp://hdl.handle.net/2117/96517

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
R97-57.pdf		1,261Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

A Machine learning approach to POS tagging

Visualitza/Obre

Explora