Mostra el registre d'ítem simple
A Machine learning approach to POS tagging
dc.contributor.author | Màrquez Villodre, Lluís |
dc.contributor.author | Padró, Lluís |
dc.contributor.author | Rodríguez Hontoria, Horacio |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Ciències de la Computació |
dc.date.accessioned | 2016-11-11T09:37:06Z |
dc.date.available | 2016-11-11T09:37:06Z |
dc.date.issued | 1997-12 |
dc.identifier.citation | Marquez, L., Padro, L., Rodriguez, H. "A Machine learning approach to POS tagging". 1997. |
dc.identifier.uri | http://hdl.handle.net/2117/96517 |
dc.description.abstract | We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation. |
dc.format.extent | 29 p. |
dc.language.iso | eng |
dc.relation.ispartofseries | LSI-97-57-R |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Informàtica teòrica |
dc.subject.other | Inductive learning |
dc.subject.other | Statistical decision trees |
dc.subject.other | Relaxation labelling |
dc.subject.other | Natural language processing |
dc.subject.other | NLP |
dc.subject.other | Morphosyntactic disambiguation |
dc.subject.other | Part of speech tagging |
dc.subject.other | POS |
dc.title | A Machine learning approach to POS tagging |
dc.type | External research report |
dc.contributor.group | Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural |
dc.rights.access | Open Access |
local.identifier.drac | 629954 |
dc.description.version | Postprint (published version) |
local.citation.author | Marquez, L.; Padro, L.; Rodriguez, H. |
Fitxers d'aquest items
Aquest ítem apareix a les col·leccions següents
-
Reports de recerca [1.107]
-
Reports de recerca [88]