Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
63.967 UPC academic works
You are here:
View Item 
  •   DSpace Home
  • Treballs acadèmics
  • Màsters oficials
  • Master in Innovation and Research in Informatics - MIRI
  • View Item
  •   DSpace Home
  • Treballs acadèmics
  • Màsters oficials
  • Master in Innovation and Research in Informatics - MIRI
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A checkpoint/restart directive-based approach

Thumbnail
View/Open
123240.pdf (2,452Mb)
Share:
 
  View Usage Statistics
Cita com:
hdl:2117/100567

Show full item record
Maroñas Bravo, Marcos
Tutor / directorBeltran Querol, Vicenç; Ayguadé Parra, EduardMés informacióMés informacióMés informació
CovenanteeBarcelona Supercomputing Center
Document typeMaster thesis
Date2017
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Exascale platforms require programming models incorporating support for resilience capabilities since the huge number of components they are expected to have is going to increase the number of errors. Checkpoint/restart is a widely used resilience technique due to its robustness and low overhead compared to other techniques. There already exists several solutions implementing this technique, such as FTI or SCR, which focus mainly on providing advanced I/O capabilities to minimize checkpoint/restart time. However, application developers are still in charge of: (1) manually serialize and deserialize the application state using a low-level API; (2) modify the natural flow of the application depending whether the current execution is a restart or not; and (3) reimplement their code regarding checkpoint/restart whenever they have to change the backend library. We present a new directive-based approach to performing application-level checkpoint/ restart in a simplified and portable way. We propose a solution based on compiler directives, such as OpenMP ones, that allows users to easily specify the state of the application that has to be saved and restored, leaving the tedious and error-prone serialization and deserialization activities to our intermediate library, which relies on a backend library (FTI/SCR) to perform scalable and efficient I/O operations. Our results, including several benchmarks and two large applications, reveal no extra overhead compared to the direct use of FTI/SCR checkpoint/restart libraries while significantly reducing the effort required by the application developers.
SubjectsProgramming (Mathematics), Software engineering, Programació (Matemàtica), Enginyeria de programari
DegreeMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
URIhttp://hdl.handle.net/2117/100567
Collections
  • Màsters oficials - Master in Innovation and Research in Informatics - MIRI [411]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
123240.pdf2,452MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Cookies policy
  • Inici de la pàgina