Mostra el registre d'ítem simple
Deep evaluation of hybrid architectures: simple metrics correlated with human judgments
dc.contributor.author | Labaka, Gorka |
dc.contributor.author | Díaz de Ilarraza Sánchez, Arantza |
dc.contributor.author | Sarasola Gabiola, Kepa |
dc.contributor.author | España Bonet, Cristina |
dc.contributor.author | Màrquez Villodre, Lluís |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics |
dc.date.accessioned | 2012-12-03T16:21:19Z |
dc.date.available | 2012-12-03T16:21:19Z |
dc.date.created | 2011 |
dc.date.issued | 2011 |
dc.identifier.citation | Labaka, G. [et al.]. Deep evaluation of hybrid architectures: simple metrics correlated with human judgments. A: International Workshop on Using Linguistic Information for Hybrid Machine Translation. "LIHMT 2011 Sponsors International Workshop on Using Linguistic Information for Hybrid Machine Translation". Barcelona: 2011, p. 50-57. |
dc.identifier.uri | http://hdl.handle.net/2117/17063 |
dc.description.abstract | The process of developing hybrid MT systems is guided by the evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture that tries to get the best of both worlds, rule-based and statistical. In a first evaluation human assessments were used to compare just the single statistical system and the hybrid one, the rule-based system was not compared by hand because the results of automatic evaluation showed a clear disadvantage. But a second and wider evaluation experiment surprisingly showed that according to human evaluation the best system was the rule-based, the one that achieved the worst results using automatic evaluation. An examination of sentences with controversial results suggested that linguistic well-formedness in the output should be considered in evaluation. After experimenting with 6 possible metrics we conclude that a simple arithmetic mean of BLEU and BLEU calculated on parts of speech of words is clearly a more human conformant metric than lexical metrics alone. |
dc.format.extent | 8 p. |
dc.language.iso | eng |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural |
dc.subject.lcsh | Machine translation |
dc.subject.lcsh | Rule-based machine translation |
dc.title | Deep evaluation of hybrid architectures: simple metrics correlated with human judgments |
dc.type | Conference lecture |
dc.subject.lemac | Traducció automàtica |
dc.contributor.group | Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://ixa2.si.ehu.es/lihmt2011/proceedings.pdf |
dc.rights.access | Open Access |
local.identifier.drac | 10177456 |
dc.description.version | Postprint (author’s final draft) |
local.citation.author | Labaka, G.; Díaz de Ilarraza, A.; Sarasola , K.; España-Bonet, C.; Marquez, L. |
local.citation.contributor | International Workshop on Using Linguistic Information for Hybrid Machine Translation |
local.citation.pubplace | Barcelona |
local.citation.publicationName | LIHMT 2011 Sponsors International Workshop on Using Linguistic Information for Hybrid Machine Translation |
local.citation.startingPage | 50 |
local.citation.endingPage | 57 |