Show simple item record

dc.contributor.authorGounaris, Anastasios
dc.contributor.authorKougka, Georgia
dc.contributor.authorTous Liesa, Rubén
dc.contributor.authorTripiana, Carlos
dc.contributor.authorTorres Viñals, Jordi
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2017-07-10T09:44:44Z
dc.date.available2017-07-10T09:44:44Z
dc.date.issued2017-07-01
dc.identifier.citationGounaris, A., Kougka, G., Tous, R., Tripiana, C., Torres, J. Dynamic configuration of partitioning in spark applications. "IEEE transactions on parallel and distributed systems", 1 Juliol 2017, vol. 28, núm. 7, p. 1891-1904.
dc.identifier.issn1045-9219
dc.identifier.urihttp://hdl.handle.net/2117/106303
dc.description.abstractSpark has become one of the main options for large-scale analytics running on top of shared-nothing clusters. This work aims to make a deep dive into the parallelism configuration and shed light on the behavior of parallel spark jobs. It is motivated by the fact that running a Spark application on all the available processors does not necessarily imply lower running time, while may entail waste of resources. We first propose analytical models for expressing the running time as a function of the number of machines employed. We then take another step, namely to present novel algorithms for configuring dynamic partitioning with a view to minimizing resource consumption without sacrificing running time beyond a user-defined limit. The problem we target is NP-hard. To tackle it, we propose a greedy approach after introducing the notions of dependency graphs and of the benefit from modifying the degree of partitioning at a stage; complementarily, we investigate a randomized approach. Our polynomial solutions are capable of judiciously use the resources that are potentially at user's disposal and strike interesting trade-offs between running time and resource consumption. Their efficiency is thoroughly investigated through experiments based on real execution data.
dc.format.extent14 p.
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Informàtica::Informàtica teòrica::Algorísmica i teoria de la complexitat
dc.subject.lcshComputational complexity
dc.subject.lcshGraph theory
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherData repartitioning
dc.subject.otherData flow optimization
dc.subject.otherData flow profiling
dc.subject.otherSpark
dc.titleDynamic configuration of partitioning in spark applications
dc.typeArticle
dc.subject.lemacComplexitat computacional
dc.subject.lemacGrafs, Teoria de
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1109/TPDS.2017.2647939
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://ieeexplore.ieee.org/document/7807262/
dc.rights.accessOpen Access
local.identifier.drac21138546
dc.description.versionPostprint (author's final draft)
local.citation.authorGounaris, A.; Kougka, G.; Tous, R.; Tripiana, C.; Torres, J.
local.citation.publicationNameIEEE transactions on parallel and distributed systems
local.citation.volume28
local.citation.number7
local.citation.startingPage1891
local.citation.endingPage1904


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record