Mostra el registre d'ítem simple
Graph-based task replication for workflow applications
dc.contributor.author | Sirvent Pardell, Raül |
dc.contributor.author | Badia Sala, Rosa Maria |
dc.contributor.author | Labarta Mancho, Jesús José |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
dc.date.accessioned | 2014-11-06T18:42:33Z |
dc.date.created | 2009 |
dc.date.issued | 2009 |
dc.identifier.citation | Sirvent, R.; Badia, R.M.; Labarta, J. Graph-based task replication for workflow applications. A: IEEE International Conference on High Performance Computing and Communications. "2009 11th IEEE international conference on high performance computing and communications: 25-27 June, Seoul, Korea: proceedings". Seül: Institute of Electrical and Electronics Engineers (IEEE), 2009, p. 20-28. |
dc.identifier.isbn | 978-0-7695-3738-2 |
dc.identifier.uri | http://hdl.handle.net/2117/24586 |
dc.description.abstract | The Grid is an heterogeneous and dynamic environment which enables distributed computation. This makes it a technology prone to failures. Some related work uses replication to overcome failures in a set of independent tasks, and in workflow applications, but they do not consider possible resource limitations when scheduling the replicas. In this paper, we focus on the use of task replication techniques for workflow applications, trying to achieve not only tolerance to the possible failures in an execution, but also to speed up the computation without demanding the user to implement an application-level checkpoint, which may be a difficult task depending on the application. Moreover, we also study what to do when there are not enough resources for replicating all running tasks. We establish different priorities of replication depending on the graph of the workflow application, giving more priority to tasks with a higher output degree. We have implemented our proposed policy in the GRID superscalar system, and we have run the fastDNAml as an experiment to prove our objectives are reached. Finally, we have identified and studied a problem which may arise due to the use of replication in workflow applications: the replication wait time. |
dc.format.extent | 9 p. |
dc.language.iso | eng |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Spain |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes |
dc.subject.lcsh | Fault-tolerant computing |
dc.subject.lcsh | Electronic data processing--Distributed processing |
dc.subject.other | Checkpointing |
dc.subject.other | Graph theory |
dc.subject.other | Grid computing |
dc.subject.other | Software fault tolerance |
dc.subject.other | Workflow management software |
dc.subject.other | Application-level checkpoint |
dc.subject.other | Distributed computation |
dc.subject.other | Failure tolerance |
dc.subject.other | FfastDNAml |
dc.subject.other | Graph-based task replication |
dc.subject.other | Grid superscalar system |
dc.subject.other | Replication wait time |
dc.subject.other | Workflow applications AUTHOR KEYWORDS Grid computing fault tolerance task replication workflow scheduling IEEE TERMS Computer architecture Distributed computing Fault tolerance Fault tolerant systems Grid computing High performance computing Processor scheduling Proposals |
dc.title | Graph-based task replication for workflow applications |
dc.type | Conference report |
dc.subject.lemac | Tolerància als errors (Informàtica) |
dc.subject.lemac | Processament distribuït de dades |
dc.contributor.group | Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
dc.identifier.doi | 10.1109/HPCC.2009.29 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://ieeexplore.ieee.org/xpl/abstractKeywords.jsp?tp=&arnumber=5166972&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F5166953%2F5166954%2F05166972.pdf%3Farnumber%3D5166972 |
dc.rights.access | Restricted access - publisher's policy |
local.identifier.drac | 15117137 |
dc.description.version | Postprint (published version) |
dc.date.lift | 10000-01-01 |
local.citation.author | Sirvent, R.; Badia, R.M.; Labarta, J. |
local.citation.contributor | IEEE International Conference on High Performance Computing and Communications |
local.citation.pubplace | Seül |
local.citation.publicationName | 2009 11th IEEE international conference on high performance computing and communications: 25-27 June, Seoul, Korea: proceedings |
local.citation.startingPage | 20 |
local.citation.endingPage | 28 |