Show simple item record

dc.contributor.authorAlegria, Iñaki
dc.contributor.authorAranberri, Nora
dc.contributor.authorComas Umbert, Pere Ramon
dc.contributor.authorFresno, Víctor
dc.contributor.authorGamallo, Pablo
dc.contributor.authorPadró, Lluís
dc.contributor.authorSan Vicente Roncal, Iñaki
dc.contributor.authorTurmo Borras, Jorge
dc.contributor.authorZubiaga, Arkaitz
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.date.accessioned2015-12-21T18:47:01Z
dc.date.available2015-12-21T18:47:01Z
dc.date.issued2015-12-01
dc.identifier.citationAlegria, I., Aranberri, N., Comas, P.R., Fresno, V., Gamallo, P., Padro, L., San Vicente, I., Turmo, J., Zubiaga, A. TweetNorm: a benchmark for lexical normalization of spanish tweets. "Language resources and evaluation", 01 Desembre 2015, vol. 49, núm. 4, p. 883-905.
dc.identifier.issn1574-020X
dc.identifier.urihttp://hdl.handle.net/2117/80964
dc.description.abstractThe language used in social media is often characterized by the abundance of informal and non-standard writing. The normalization of this non-standard language can be crucial to facilitate the subsequent textual processing and to consequently help boost the performance of natural language processing tools applied to social media text. In this paper we present a benchmark for lexical normalization of social media posts, specifically for tweets in Spanish language. We describe the tweet normalization challenge we organized recently, analyze the performance achieved by the different systems submitted to the challenge, and delve into the characteristics of systems to identify the features that were useful. The organization of this challenge has led to the production of a benchmark for lexical normalization of social media, including an evaluation framework, as well as an annotated corpus of Spanish tweets-TweetNorm_es-, which we make publicly available. The creation of this benchmark and the evaluation has brought to light the types of words that submitted systems did best with, and posits the main shortcomings to be addressed in future work.
dc.format.extent23 p.
dc.language.isoeng
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject.lcshStandard language
dc.subject.lcshSocial media
dc.subject.lcshTwitter
dc.subject.otherLexical normalization
dc.subject.otherTwitter
dc.subject.otherSocial media
dc.subject.otherCorpus
dc.subject.otherEvaluation
dc.titleTweetNorm: a benchmark for lexical normalization of spanish tweets
dc.typeArticle
dc.subject.lemacLexicografia
dc.subject.lemacNormalització lingüística
dc.subject.lemacMitjans de comunicació social
dc.subject.lemacTwitter
dc.contributor.groupUniversitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
dc.identifier.doi10.1007/s10579-015-9315-6
dc.rights.accessOpen Access
local.identifier.drac17269829
dc.description.versionPostprint (published version)
local.citation.authorAlegria, I.; Aranberri, N.; Comas, P.R.; Fresno, V.; Gamallo, P.; Padro, L.; San Vicente, I.; Turmo, J.; Zubiaga, A.
local.citation.publicationNameLanguage resources and evaluation
local.citation.volume49
local.citation.number4
local.citation.startingPage883
local.citation.endingPage905


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain