Mostra el registre d'ítem simple
TweetNorm_es: an annotated corpus for Spanish microtext normalization
dc.contributor.author | Alegria, Iñaki |
dc.contributor.author | Aranberri, Nora |
dc.contributor.author | Comas Umbert, Pere Ramon |
dc.contributor.author | Fresno, Víctor |
dc.contributor.author | Gamallo, Pablo |
dc.contributor.author | Padró, Lluís |
dc.contributor.author | San Vicente Roncal, Iñaki |
dc.contributor.author | Turmo Borras, Jorge |
dc.contributor.author | Zubiaga, Arkaitz |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics |
dc.date.accessioned | 2014-07-07T11:09:10Z |
dc.date.available | 2014-07-07T11:09:10Z |
dc.date.created | 2014 |
dc.date.issued | 2014 |
dc.identifier.citation | Alegria, I. [et al.]. TweetNorm_es: an annotated corpus for Spanish microtext normalization. A: International Conference on Language Resources and Evaluation. "Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)". Reykjavik: European Language Resources Association (ELRA), 2014, p. 2274-2278. |
dc.identifier.isbn | 978-2-9517408-8-4 |
dc.identifier.uri | http://hdl.handle.net/2117/23411 |
dc.description.abstract | In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used. |
dc.format.extent | 5 p. |
dc.language.iso | eng |
dc.publisher | European Language Resources Association (ELRA) |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural |
dc.subject.lcsh | Spanish language -- 21st century |
dc.subject.other | Microtext normalization |
dc.subject.other | |
dc.subject.other | phonology |
dc.title | TweetNorm_es: an annotated corpus for Spanish microtext normalization |
dc.type | Conference lecture |
dc.subject.lemac | Castellà -- Fonologia |
dc.contributor.group | Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://www.lrec-conf.org/proceedings/lrec2014/pdf/442_Paper.pdf |
dc.rights.access | Open Access |
local.identifier.drac | 14920822 |
dc.description.version | Postprint (published version) |
local.citation.author | Alegria, I.; Aranberri, N.; Comas, P.R.; Fresno, V.; Gamallo, P.; Padro, L.; San Vicente, I.; Turmo, J.; Zubiaga, A. |
local.citation.contributor | International Conference on Language Resources and Evaluation |
local.citation.pubplace | Reykjavik |
local.citation.publicationName | Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) |
local.citation.startingPage | 2274 |
local.citation.endingPage | 2278 |