Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.566 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament de Ciències de la Computació
  • Reports de recerca
  • View Item
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament de Ciències de la Computació
  • Reports de recerca
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A Machine learning approach to POS tagging

Thumbnail
View/Open
R97-57.pdf (1,261Mb)
Share:
 
  View Usage Statistics
Cita com:
hdl:2117/96517

Show full item record
Màrquez Villodre, Lluís
Padró, LluísMés informacióMés informacióMés informació
Rodríguez Hontoria, HoracioMés informacióMés informació
Document typeResearch report
Defense date1997-12
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation.
CitationMarquez, L., Padro, L., Rodriguez, H. "A Machine learning approach to POS tagging". 1997. 
Is part ofLSI-97-57-R
URIhttp://hdl.handle.net/2117/96517
Collections
  • Departament de Ciències de la Computació - Reports de recerca [1.106]
  • GPLN - Grup de Processament del Llenguatge Natural - Reports de recerca [88]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
R97-57.pdf1,261MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Cookies policy
  • Inici de la pàgina