Deep evaluation of hybrid architectures: simple metrics correlated with human judgments

Labaka, Gorka; Díaz de Ilarraza Sánchez, Arantza; Sarasola Gabiola, Kepa; España Bonet, Cristina; Màrquez Villodre, Lluís

Visualitza/Obre

LIHMT11Labakaetal.pdf (183,9Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Labaka, Gorka

Díaz de Ilarraza Sánchez, Arantza

Sarasola Gabiola, Kepa

España Bonet, Cristina

Màrquez Villodre, Lluís

Tipus de documentComunicació de congrés

Data publicació2011

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

The process of developing hybrid MT systems is guided by the evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture that tries to get the best of both worlds, rule-based and statistical. In a first evaluation human assessments were used to compare just the single statistical system and the hybrid one, the rule-based system was not compared by hand because the results of automatic evaluation showed a clear disadvantage. But a second and wider evaluation experiment surprisingly showed that according to human evaluation the best system was the rule-based, the one that achieved the worst results using automatic evaluation. An examination of sentences with controversial results suggested that linguistic well-formedness in the output should be considered in evaluation. After experimenting with 6 possible metrics we conclude that a simple arithmetic mean of BLEU and BLEU calculated on parts of speech of words is clearly a more human conformant metric than lexical metrics alone.

CitacióLabaka, G. [et al.]. Deep evaluation of hybrid architectures: simple metrics correlated with human judgments. A: International Workshop on Using Linguistic Information for Hybrid Machine Translation. "LIHMT 2011 Sponsors International Workshop on Using Linguistic Information for Hybrid Machine Translation". Barcelona: 2011, p. 50-57.

URIhttp://hdl.handle.net/2117/17063

Versió de l'editorhttp://ixa2.si.ehu.es/lihmt2011/proceedings.pdf

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
LIHMT11Labakaetal.pdf		183,9Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Deep evaluation of hybrid architectures: simple metrics correlated with human judgments

Visualitza/Obre

Explora