H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Jovanovic, Petar; Romero Moral, Óscar; Calders, Toon; Abelló Gamazo, Alberto

doi:10.1007/978-3-319-44039-2_21

dc.contributor.author	Jovanovic, Petar
dc.contributor.author	Romero Moral, Óscar
dc.contributor.author	Calders, Toon
dc.contributor.author	Abelló Gamazo, Alberto
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.date.accessioned	2017-04-27T08:08:17Z
dc.date.available	2017-04-27T08:08:17Z
dc.date.issued	2016
dc.identifier.citation	Jovanovic, P., Romero, O., Calders, T., Abello, A. H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution. A: Conference on Advances in Databases and Information Systems. "Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Proceedings". Praga: 2016, p. 306-320.
dc.identifier.isbn	9783319440385
dc.identifier.uri	http://hdl.handle.net/2117/103769
dc.description	The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44039-2_21
dc.description.abstract	Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we addressthe challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.
dc.format.extent	15 p.
dc.language.iso	eng
dc.subject	Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject.lcsh	Data processing
dc.subject.other	Computer programming
dc.subject.other	Data handling
dc.subject.other	Information systems
dc.subject.other	Scheduling
dc.subject.other	Data intensive
dc.subject.other	Data locality
dc.subject.other	Data redistribution
dc.subject.other	Distributed data processing
dc.subject.other	Execution scenario
dc.subject.other	Generic algorithm
dc.subject.other	Input applications
dc.subject.other	Performance Gain
dc.title	H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution
dc.type	Conference report
dc.subject.lemac	Dades -- Recuperació (Informàtica)
dc.contributor.group	Universitat Politècnica de Catalunya. MPI - Modelització i Processament de la Informació
dc.identifier.doi	10.1007/978-3-319-44039-2_21
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://link.springer.com/chapter/10.1007/978-3-319-44039-2_21
dc.rights.access	Open Access
local.identifier.drac	19032118
dc.description.version	Postprint (author's final draft)
local.citation.author	Jovanovic, P.; Romero, O.; Calders, T.; Abello, A.
local.citation.contributor	Conference on Advances in Databases and Information Systems
local.citation.pubplace	Praga
local.citation.publicationName	Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Proceedings
local.citation.startingPage	306
local.citation.endingPage	320

Fitxers d'aquest items

Nom:: adbis2016-cr-jovanovic.pdf
Mida:: 2,489Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [119]
Ponències/Comunicacions de congressos [529]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora