Mostra el registre d'ítem simple

dc.contributor.authorJovanovic, Petar
dc.contributor.authorRomero Moral, Óscar
dc.contributor.authorCalders, Toon
dc.contributor.authorAbelló Gamazo, Alberto
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.date.accessioned2017-04-27T08:08:17Z
dc.date.available2017-04-27T08:08:17Z
dc.date.issued2016
dc.identifier.citationJovanovic, P., Romero, O., Calders, T., Abello, A. H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution. A: Conference on Advances in Databases and Information Systems. "Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Proceedings". Praga: 2016, p. 306-320.
dc.identifier.isbn9783319440385
dc.identifier.urihttp://hdl.handle.net/2117/103769
dc.descriptionThe final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44039-2_21
dc.description.abstractToday’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we addressthe challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.
dc.format.extent15 p.
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject.lcshData processing
dc.subject.otherComputer programming
dc.subject.otherData handling
dc.subject.otherInformation systems
dc.subject.otherScheduling
dc.subject.otherData intensive
dc.subject.otherData locality
dc.subject.otherData redistribution
dc.subject.otherDistributed data processing
dc.subject.otherExecution scenario
dc.subject.otherGeneric algorithm
dc.subject.otherInput applications
dc.subject.otherPerformance Gain
dc.titleH-word: Supporting job scheduling in Hadoop with workload-driven data redistribution
dc.typeConference report
dc.subject.lemacDades -- Recuperació (Informàtica)
dc.contributor.groupUniversitat Politècnica de Catalunya. MPI - Modelització i Processament de la Informació
dc.identifier.doi10.1007/978-3-319-44039-2_21
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://link.springer.com/chapter/10.1007/978-3-319-44039-2_21
dc.rights.accessOpen Access
local.identifier.drac19032118
dc.description.versionPostprint (author's final draft)
local.citation.authorJovanovic, P.; Romero, O.; Calders, T.; Abello, A.
local.citation.contributorConference on Advances in Databases and Information Systems
local.citation.pubplacePraga
local.citation.publicationNameAdvances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Proceedings
local.citation.startingPage306
local.citation.endingPage320


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple