Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.772 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • GPLN - Grup de Processament del Llenguatge Natural
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • GPLN - Grup de Processament del Llenguatge Natural
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Deep evaluation of hybrid architectures: simple metrics correlated with human judgments

Thumbnail
View/Open
LIHMT11Labakaetal.pdf (183,9Kb)
Share:
 
  View Usage Statistics
Cita com:
hdl:2117/17063

Show full item record
Labaka, Gorka
Díaz de Ilarraza Sánchez, Arantza
Sarasola Gabiola, Kepa
España Bonet, Cristina
Màrquez Villodre, Lluís
Document typeConference lecture
Defense date2011
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
The process of developing hybrid MT systems is guided by the evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture that tries to get the best of both worlds, rule-based and statistical. In a first evaluation human assessments were used to compare just the single statistical system and the hybrid one, the rule-based system was not compared by hand because the results of automatic evaluation showed a clear disadvantage. But a second and wider evaluation experiment surprisingly showed that according to human evaluation the best system was the rule-based, the one that achieved the worst results using automatic evaluation. An examination of sentences with controversial results suggested that linguistic well-formedness in the output should be considered in evaluation. After experimenting with 6 possible metrics we conclude that a simple arithmetic mean of BLEU and BLEU calculated on parts of speech of words is clearly a more human conformant metric than lexical metrics alone.
CitationLabaka, G. [et al.]. Deep evaluation of hybrid architectures: simple metrics correlated with human judgments. A: International Workshop on Using Linguistic Information for Hybrid Machine Translation. "LIHMT 2011 Sponsors International Workshop on Using Linguistic Information for Hybrid Machine Translation". Barcelona: 2011, p. 50-57. 
URIhttp://hdl.handle.net/2117/17063
Publisher versionhttp://ixa2.si.ehu.es/lihmt2011/proceedings.pdf
Collections
  • GPLN - Grup de Processament del Llenguatge Natural - Ponències/Comunicacions de congressos [192]
  • Departament de Ciències de la Computació - Ponències/Comunicacions de congressos [1.232]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
LIHMT11Labakaetal.pdf183,9KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina