Deep evaluation of hybrid architectures: simple metrics correlated with human judgments

Labaka, Gorka; Díaz de Ilarraza Sánchez, Arantza; Sarasola Gabiola, Kepa; España Bonet, Cristina; Màrquez Villodre, Lluís

dc.contributor.author	Labaka, Gorka
dc.contributor.author	Díaz de Ilarraza Sánchez, Arantza
dc.contributor.author	Sarasola Gabiola, Kepa
dc.contributor.author	España Bonet, Cristina
dc.contributor.author	Màrquez Villodre, Lluís
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics
dc.date.accessioned	2012-12-03T16:21:19Z
dc.date.available	2012-12-03T16:21:19Z
dc.date.created	2011
dc.date.issued	2011
dc.identifier.citation	Labaka, G. [et al.]. Deep evaluation of hybrid architectures: simple metrics correlated with human judgments. A: International Workshop on Using Linguistic Information for Hybrid Machine Translation. "LIHMT 2011 Sponsors International Workshop on Using Linguistic Information for Hybrid Machine Translation". Barcelona: 2011, p. 50-57.
dc.identifier.uri	http://hdl.handle.net/2117/17063
dc.description.abstract	The process of developing hybrid MT systems is guided by the evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture that tries to get the best of both worlds, rule-based and statistical. In a first evaluation human assessments were used to compare just the single statistical system and the hybrid one, the rule-based system was not compared by hand because the results of automatic evaluation showed a clear disadvantage. But a second and wider evaluation experiment surprisingly showed that according to human evaluation the best system was the rule-based, the one that achieved the worst results using automatic evaluation. An examination of sentences with controversial results suggested that linguistic well-formedness in the output should be considered in evaluation. After experimenting with 6 possible metrics we conclude that a simple arithmetic mean of BLEU and BLEU calculated on parts of speech of words is clearly a more human conformant metric than lexical metrics alone.
dc.format.extent	8 p.
dc.language.iso	eng
dc.subject	Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural
dc.subject.lcsh	Machine translation
dc.subject.lcsh	Rule-based machine translation
dc.title	Deep evaluation of hybrid architectures: simple metrics correlated with human judgments
dc.type	Conference lecture
dc.subject.lemac	Traducció automàtica
dc.contributor.group	Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://ixa2.si.ehu.es/lihmt2011/proceedings.pdf
dc.rights.access	Open Access
local.identifier.drac	10177456
dc.description.version	Postprint (author’s final draft)
local.citation.author	Labaka, G.; Díaz de Ilarraza, A.; Sarasola , K.; España-Bonet, C.; Marquez, L.
local.citation.contributor	International Workshop on Using Linguistic Information for Hybrid Machine Translation
local.citation.pubplace	Barcelona
local.citation.publicationName	LIHMT 2011 Sponsors International Workshop on Using Linguistic Information for Hybrid Machine Translation
local.citation.startingPage	50
local.citation.endingPage	57

Fitxers d'aquest items

Nom:: LIHMT11Labakaetal.pdf
Mida:: 183,9Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [192]
Ponències/Comunicacions de congressos [1.274]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Deep evaluation of hybrid architectures: simple metrics correlated with human judgments

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora