Mostra el registre d'ítem simple
H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution
dc.contributor.author | Jovanovic, Petar |
dc.contributor.author | Romero Moral, Óscar |
dc.contributor.author | Calders, Toon |
dc.contributor.author | Abelló Gamazo, Alberto |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació |
dc.date.accessioned | 2017-04-27T08:08:17Z |
dc.date.available | 2017-04-27T08:08:17Z |
dc.date.issued | 2016 |
dc.identifier.citation | Jovanovic, P., Romero, O., Calders, T., Abello, A. H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution. A: Conference on Advances in Databases and Information Systems. "Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Proceedings". Praga: 2016, p. 306-320. |
dc.identifier.isbn | 9783319440385 |
dc.identifier.uri | http://hdl.handle.net/2117/103769 |
dc.description | The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44039-2_21 |
dc.description.abstract | Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we addressthe challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution. |
dc.format.extent | 15 p. |
dc.language.iso | eng |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació |
dc.subject.lcsh | Data processing |
dc.subject.other | Computer programming |
dc.subject.other | Data handling |
dc.subject.other | Information systems |
dc.subject.other | Scheduling |
dc.subject.other | Data intensive |
dc.subject.other | Data locality |
dc.subject.other | Data redistribution |
dc.subject.other | Distributed data processing |
dc.subject.other | Execution scenario |
dc.subject.other | Generic algorithm |
dc.subject.other | Input applications |
dc.subject.other | Performance Gain |
dc.title | H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution |
dc.type | Conference report |
dc.subject.lemac | Dades -- Recuperació (Informàtica) |
dc.contributor.group | Universitat Politècnica de Catalunya. MPI - Modelització i Processament de la Informació |
dc.identifier.doi | 10.1007/978-3-319-44039-2_21 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://link.springer.com/chapter/10.1007/978-3-319-44039-2_21 |
dc.rights.access | Open Access |
local.identifier.drac | 19032118 |
dc.description.version | Postprint (author's final draft) |
local.citation.author | Jovanovic, P.; Romero, O.; Calders, T.; Abello, A. |
local.citation.contributor | Conference on Advances in Databases and Information Systems |
local.citation.pubplace | Praga |
local.citation.publicationName | Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Proceedings |
local.citation.startingPage | 306 |
local.citation.endingPage | 320 |