Show simple item record

dc.contributor.authorPoveda Poveda, Jordi
dc.contributor.authorSurdeanu, Mihai
dc.contributor.authorTurmo Borras, Jorge
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.identifier.citationPoveda, J., Surdeanu, M., Turmo, J. "A Bootstrapping architecture for time expression recognition in unlabelled corpora via syntactic-semantic patterns". 2007.
dc.description.abstractIn this paper we describe a semi-supervised approach to the extraction of time expression mentions in large unlabelled corpora based on bootstrapping. Bootstrapping techniques rely on a relatively small amount of initial human-supplied examples (termed “seeds”) of the type of entity or concept to be learned, in order to capture an initial set of patterns or rules from the unlabelled text that extract the supplied data. In turn, the learned patterns are employed to find new potential examples, and the process is repeated to grow the set of patterns and (optionally) the set of examples. In order to prevent the learned pattern set from producing spurious results, it becomes essential to implement a ranking and selection procedure to filter out “bad” patterns and, depending on the case, new candidate examples. Therefore, the type of patterns employed (knowledge representation) as well as the ranking and selection procedure are paramount to the quality of the results. We present a complete bootstrapping algorithm for recognition of time expressions, with a special emphasis on the type of patterns used (a combination of semantic and morpho- syntantic elements) and the ranking and selection criteria. Bootstrap- ping techniques have been previously employed with limited success for several NLP problems, both of recognition and classification, but their application to time expression recognition is, to the best of our knowledge, novel. As of this writing, the described architecture is in the final stages of implementation, with experimention and evalution being already underway.
dc.format.extent24 p.
dc.subjectÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
dc.subject.otherKnowledge representation
dc.subject.otherBootstrapping architecture
dc.titleA Bootstrapping architecture for time expression recognition in unlabelled corpora via syntactic-semantic patterns
dc.typeExternal research report
dc.contributor.groupUniversitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
dc.rights.accessOpen Access
dc.description.versionPostprint (published version)
upcommons.citation.authorPoveda, J.; Surdeanu, M.; Turmo, J.

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder