Checkpoint-based Fault-tolerant Infrastructure for Virtualized Service Providers

Goiri Presa, Íñigo; Julià, Ferran; Guitart Fernández, Jordi; Torres Viñals, Jordi

dc.contributor.author	Goiri Presa, Íñigo
dc.contributor.author	Julià, Ferran
dc.contributor.author	Guitart Fernández, Jordi
dc.contributor.author	Torres Viñals, Jordi
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2010-06-01T11:12:42Z
dc.date.available	2010-06-01T11:12:42Z
dc.date.created	2010-04-23
dc.date.issued	2010-04-23
dc.identifier.citation	Goiri, I. [et al.]. Checkpoint-based Fault-tolerant Infrastructure for Virtualized Service Providers. A: 2010 IEEE/IFIP Network Operations and Management Symposium. "2010 IEEE/IFIP Network Operations and Management Symposium". Osaka: IEEE Computer Society Publications, 2010, p. 455-462.
dc.identifier.isbn	978-1-4244-5367-2
dc.identifier.uri	http://hdl.handle.net/2117/7460
dc.description.abstract	Crash and omission failures are common in service providers: a disk can break down or a link can fail anytime. In addition, the probability of a node failure increases with the number of nodes. Apart from reducing the provider’s computation power and jeopardizing the fulfillment of his contracts, this can also lead to computation time wasting when the crash occurs before finishing the task execution. In order to avoid this problem, efficient checkpoint infrastructures are required, especially in virtualized environments where these infrastructures must deal with huge virtual machine images. This paper proposes a smart checkpoint infrastructure for virtualized service providers. It uses Another Union File System to differentiate read-only from read-write parts in the virtual machine image. In this way, read-only parts can be checkpointed only once, while the rest of checkpoints must only save the modifications in read-write parts, thus reducing the time needed to make a checkpoint. The checkpoints are stored in a Hadoop Distributed File System. This allows resuming a task execution faster after a node crash and increasing the fault tolerance of the system, since checkpoints are distributed and replicated in all the nodes of the provider. This paper presents a running implementation of this infrastructure and its evaluation, demonstrating that it is an effective way to make faster checkpoints with low interference on task execution and efficient task recovery after a node failure.
dc.format.extent	8 p.
dc.language.iso	eng
dc.publisher	IEEE Computer Society Publications
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcsh	Cloud computing -- Security measures
dc.title	Checkpoint-based Fault-tolerant Infrastructure for Virtualized Service Providers
dc.type	Conference report
dc.subject.lemac	Computació en núvol
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.description.peerreviewed	Peer Reviewed
dc.rights.access	Open Access
local.identifier.drac	2532570
dc.description.version	Postprint (published version)
local.citation.author	Goiri, I.; Julià, F.; Guitart, J.; Torres, J.
local.citation.contributor	2010 IEEE/IFIP Network Operations and Management Symposium
local.citation.pubplace	Osaka
local.citation.publicationName	2010 IEEE/IFIP Network Operations and Management Symposium
local.citation.startingPage	455
local.citation.endingPage	462

Fitxers d'aquest items

Nom:: Goiri.pdf
Mida:: 368,6Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [784]
Ponències/Comunicacions de congressos [1.955]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Checkpoint-based Fault-tolerant Infrastructure for Virtualized Service Providers

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora