Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

57.066 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament de Ciències de la Computació
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament de Ciències de la Computació
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Automatic Spanish translation of SQuAD dataset for multi-lingual question answering

Thumbnail
View/Open
2020.lrec-1.677(1).pdf (228,8Kb)
Share:
 
  View Usage Statistics
Cita com:
hdl:2117/329270

Show full item record
Carrino, Casimiro PioMés informació
Ruiz Costa-Jussà, MartaMés informacióMés informació
Rodríguez Fonollosa, José AdriánMés informacióMés informacióMés informació
Document typeConference lecture
Defense date2020
PublisherEuropean Language Resources Association (ELRA)
Rights accessOpen Access
Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain
ProjectAUTONOMOUS LIFELONG LEARNING INTELLIGENT SYSTEMS (AEI-PCIN-2017-079)
Abstract
Recently, multilingual question answering became a crucial research topic, and it is receiving increased interest in the NLP community.However, the unavailability of large-scale datasets makes it challenging to train multilingual QA systems with performance comparableto the English ones. In this work, we develop the Translate Align Retrieve (TAR) method to automatically translate the Stanford QuestionAnswering Dataset (SQuAD) v1.1 to Spanish. We then used this dataset to train Spanish QA systems by fine-tuning a Multilingual-BERTmodel. Finally, we evaluated our QA models with the recently proposed MLQA and XQuAD benchmarks for cross-lingual ExtractiveQA. Experimental results show that our models outperform the previous Multilingual-BERT baselines achieving the new state-of-the-artvalues of 68.1 F1 on the Spanish MLQA corpus and 77.6 F1 on the Spanish XQuAD corpus. The resulting, synthetically generatedSQuAD-es v1.1 corpora, with almost 100% of data contained in the original English version, to the best of our knowledge, is the firstlarge-scale QA training resource for Spanish.
CitationCarrino, C.; Costa-jussà, M.R.; Fonollosa, J.A.R. Automatic Spanish translation of SQuAD dataset for multi-lingual question answering. A: International Conference on Language Resources and Evaluation. "LREC 2020: 12th International Conference on Language Resources and Evaluation: Marseílle, France: May 13-15, 2020: conference proceedings". Paris: European Language Resources Association (ELRA), 2020, p. 5515-5523. ISBN 979-10-95546-34-4. 
URIhttp://hdl.handle.net/2117/329270
ISBN979-10-95546-34-4
Publisher versionhttps://www.aclweb.org/anthology/2020.lrec-1.677.pdf
Collections
  • Departament de Ciències de la Computació - Ponències/Comunicacions de congressos [1.191]
  • Doctorat en Teoria del Senyal i Comunicacions - Ponències/Comunicacions de congressos [165]
  • VEU - Grup de Tractament de la Parla - Ponències/Comunicacions de congressos [436]
  • Departament de Teoria del Senyal i Comunicacions - Ponències/Comunicacions de congressos [3.190]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
2020.lrec-1.677(1).pdf228,8KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Inici de la pàgina