H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Jovanovic, Petar; Romero Moral, Óscar; Calders, Toon; Abelló Gamazo, Alberto

doi:10.1007/978-3-319-44039-2_21

Visualitza/Obre

adbis2016-cr-jovanovic.pdf (2,489Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Jovanovic, Petar

Romero Moral, Óscar

Calders, Toon

Abelló Gamazo, Alberto

Tipus de documentText en actes de congrés

Data publicació2016

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we addressthe challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.

Descripció

The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44039-2_21

CitacióJovanovic, P., Romero, O., Calders, T., Abello, A. H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution. A: Conference on Advances in Databases and Information Systems. "Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Proceedings". Praga: 2016, p. 306-320.

URIhttp://hdl.handle.net/2117/103769

DOI10.1007/978-3-319-44039-2_21

ISBN9783319440385

Versió de l'editorhttp://link.springer.com/chapter/10.1007/978-3-319-44039-2_21

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
adbis2016-cr-jovanovic.pdf		2,489Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Visualitza/Obre

Explora