A runtime heuristic to selectively replicate tasks for application-specific reliability targets
Cita com:
hdl:2117/102228
Tipus de documentText en actes de congrés
Data publicació2016
EditorInstitute of Electrical and Electronics Engineers (IEEE)
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.
CitacióSubasi, O., Yalcin, G., Zyulkyarov, F., Unsal, O., Labarta, J. A runtime heuristic to selectively replicate tasks for application-specific reliability targets. A: IEEE International Conference on Cluster Computing. "2016 IEEE International Conference on Cluster Computing: 13-15 September 2016, Taipei, Taiwan: proceedings". Taipei: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 498-505.
ISBN978-1-5090-3653-0
Versió de l'editorhttp://ieeexplore.ieee.org/document/7776550/
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
A+runtime+heuri ... n-specific+reliability.pdf | 552,3Kb | Visualitza/Obre |