Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques
Cita com:
hdl:2117/106857
Tipus de documentComunicació de congrés
Data publicació2017-06-15
EditorACM Digital Library
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
In this paper we describe the design of fault tolerance capabilities for general-purpose offload semantics, based on the OmpSs programming model. Using ParaStation MPI, a production MPI-3.1 implementation, we explore the features that, being standard compliant, an MPI stack must support to provide the necessary fault tolerance guarantees, based on MPI's dynamic process management. Our results, including synthetic benchmarks and applications, reveal low runtime overhead and efficient recovery, demonstrating that the existing MPI standard provided us with sufficient mechanisms to implement an effective and efficient fault-tolerant solution.
CitacióPeña, A. J. [et al.]. Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques. A: "Proceeding ICS '17 Proceedings of the International Conference on Supercomputing". ACM Digital Library, 2017, p. 22 : 1-22 : 10.
ISBN978-1-4503-5020-4
Versió de l'editorhttp://dl.acm.org/citation.cfm?id=3079093
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Supporting Auto ... Offloaded Distributed.pdf | 2,261Mb | Visualitza/Obre |