TweetNorm_es: an annotated corpus for Spanish microtext normalization

Alegria, Iñaki; Aranberri, Nora; Comas Umbert, Pere Ramon; Fresno, Víctor; Gamallo, Pablo; Padró, Lluís; San Vicente Roncal, Iñaki; Turmo Borras, Jorge; Zubiaga, Arkaitz

dc.contributor.author	Alegria, Iñaki
dc.contributor.author	Aranberri, Nora
dc.contributor.author	Comas Umbert, Pere Ramon
dc.contributor.author	Fresno, Víctor
dc.contributor.author	Gamallo, Pablo
dc.contributor.author	Padró, Lluís
dc.contributor.author	San Vicente Roncal, Iñaki
dc.contributor.author	Turmo Borras, Jorge
dc.contributor.author	Zubiaga, Arkaitz
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics
dc.date.accessioned	2014-07-07T11:09:10Z
dc.date.available	2014-07-07T11:09:10Z
dc.date.created	2014
dc.date.issued	2014
dc.identifier.citation	Alegria, I. [et al.]. TweetNorm_es: an annotated corpus for Spanish microtext normalization. A: International Conference on Language Resources and Evaluation. "Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)". Reykjavik: European Language Resources Association (ELRA), 2014, p. 2274-2278.
dc.identifier.isbn	978-2-9517408-8-4
dc.identifier.uri	http://hdl.handle.net/2117/23411
dc.description.abstract	In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.
dc.format.extent	5 p.
dc.language.iso	eng
dc.publisher	European Language Resources Association (ELRA)
dc.subject	Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural
dc.subject.lcsh	Spanish language -- 21st century
dc.subject.other	Microtext normalization
dc.subject.other	Twitter
dc.subject.other	phonology
dc.title	TweetNorm_es: an annotated corpus for Spanish microtext normalization
dc.type	Conference lecture
dc.subject.lemac	Castellà -- Fonologia
dc.contributor.group	Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://www.lrec-conf.org/proceedings/lrec2014/pdf/442_Paper.pdf
dc.rights.access	Open Access
local.identifier.drac	14920822
dc.description.version	Postprint (published version)
local.citation.author	Alegria, I.; Aranberri, N.; Comas, P.R.; Fresno, V.; Gamallo, P.; Padro, L.; San Vicente, I.; Turmo, J.; Zubiaga, A.
local.citation.contributor	International Conference on Language Resources and Evaluation
local.citation.pubplace	Reykjavik
local.citation.publicationName	Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
local.citation.startingPage	2274
local.citation.endingPage	2278

Fitxers d'aquest items

Nom:: TweetNorm.pdf
Mida:: 141,8Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [192]
Ponències/Comunicacions de congressos [1.274]
Ponències/Comunicacions de congressos [3.329]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

TweetNorm_es: an annotated corpus for Spanish microtext normalization

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora