Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
68.799 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Capítols de llibre
  • View Item
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Capítols de llibre
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Managing failures in task-based parallel workflows in distributed computing environments

Thumbnail
View/Open
Europar_Failure_mangement_CR-1.pdf (376,7Kb)
 
10.1007/978-3-030-57675-2_26
 
  View UPCommons Usage Statistics
  LA Referencia / Recolecta stats
Includes usage data since 2022
Cita com:
hdl:2117/328312

Show full item record
Ejarque, JorgeMés informació
Bertran, Marta
Álvarez Cid-Fuentes, Javier
Conejero, Javier
Badia Sala, Rosa MariaMés informacióMés informacióMés informació
ColaboratorMalawski, M; Rzadca, K
Document typePart of book or chapter of book
Defense date2020
PublisherSpringer, Cham
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
ProjectBioExcel-2 - BioExcel Centre of Excellence for ComputationalBiomolecular Research (EC-H2020-823830)
BioExcel - Centre of Excellence for Biomolecular Research (EC-H2020-675728)
Abstract
Current scientific workflows are large and complex. They normally perform thousands of simulations whose results combined with searching and data analytics algorithms, in order to infer new knowledge, generate a very large amount of data. To this end, workflows comprise many tasks and some of them may fail. Most of the work done about failure management in workflow managers and runtimes focuses on recovering from failures caused by resources (retrying or resubmitting the failed computation in other resources, etc.) However, some of these failures can be caused by the application itself (corrupted data, algorithms which are not converging for certain conditions, etc.), and these fault tolerance mechanisms are not sufficient to perform a successful workflow execution. In these cases, developers have to add some code in their applications to prevent and manage the possible failures. In this paper, we propose a simple interface and a set of transparent runtime mechanisms to simplify how scientists deal with application-based failures in task-based parallel workflows. We have validated our proposal with use-cases from e-science and machine learning to show the benefits of the proposed interface and mechanisms in terms of programming productivity and performance.
Dataset  https://doi.org/10.6084/m9.figshare.12556445
CitationEjarque, J. [et al.]. Managing failures in task-based parallel workflows in distributed computing environments. A: Malawski, M.; Rzadca, K.. "Euro-Par 2020: Parallel Processing. Euro-Par 2020. Lecture Notes in Computer Science, vol 12247". Springer, Cham, 2020, p. 411-425. 
URIhttp://hdl.handle.net/2117/328312
DOI10.1007/978-3-030-57675-2_26
ISBN978-3-030-57674-5
978-3-030-57675-2
Publisher versionhttps://link.springer.com/chapter/10.1007/978-3-030-57675-2_26
Collections
  • Computer Sciences - Capítols de llibre [25]
  View UPCommons Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
Europar_Failure_mangement_CR-1.pdf376,7KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Metadata under:Metadata under CC0
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina