dc.contributor.author | Gounaris, Anastasios |
dc.contributor.author | Kougka, Georgia |
dc.contributor.author | Tous Liesa, Rubén |
dc.contributor.author | Tripiana, Carlos |
dc.contributor.author | Torres Viñals, Jordi |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
dc.date.accessioned | 2017-07-10T09:44:44Z |
dc.date.available | 2017-07-10T09:44:44Z |
dc.date.issued | 2017-07-01 |
dc.identifier.citation | Gounaris, A., Kougka, G., Tous, R., Tripiana, C., Torres, J. Dynamic configuration of partitioning in spark applications. "IEEE transactions on parallel and distributed systems", 1 Juliol 2017, vol. 28, núm. 7, p. 1891-1904. |
dc.identifier.issn | 1045-9219 |
dc.identifier.uri | http://hdl.handle.net/2117/106303 |
dc.description.abstract | Spark has become one of the main options for large-scale analytics running on top of shared-nothing clusters. This work aims to make a deep dive into the parallelism configuration and shed light on the behavior of parallel spark jobs. It is motivated by the fact that running a Spark application on all the available processors does not necessarily imply lower running time, while may entail waste of resources. We first propose analytical models for expressing the running time as a function of the number of machines employed. We then take another step, namely to present novel algorithms for configuring dynamic partitioning with a view to minimizing resource consumption without sacrificing running time beyond a user-defined limit. The problem we target is NP-hard. To tackle it, we propose a greedy approach after introducing the notions of dependency graphs and of the benefit from modifying the degree of partitioning at a stage; complementarily, we investigate a randomized approach. Our polynomial solutions are capable of judiciously use the resources that are potentially at user's disposal and strike interesting trade-offs between running time and resource consumption. Their efficiency is thoroughly investigated through experiments based on real execution data. |
dc.format.extent | 14 p. |
dc.language.iso | eng |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Informàtica teòrica::Algorísmica i teoria de la complexitat |
dc.subject.lcsh | Computational complexity |
dc.subject.lcsh | Graph theory |
dc.subject.lcsh | Parallel processing (Electronic computers) |
dc.subject.other | Data repartitioning |
dc.subject.other | Data flow optimization |
dc.subject.other | Data flow profiling |
dc.subject.other | Spark |
dc.title | Dynamic configuration of partitioning in spark applications |
dc.type | Article |
dc.subject.lemac | Complexitat computacional |
dc.subject.lemac | Grafs, Teoria de |
dc.subject.lemac | Processament en paral·lel (Ordinadors) |
dc.contributor.group | Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
dc.identifier.doi | 10.1109/TPDS.2017.2647939 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://ieeexplore.ieee.org/document/7807262/ |
dc.rights.access | Open Access |
local.identifier.drac | 21138546 |
dc.description.version | Postprint (author's final draft) |
local.citation.author | Gounaris, A.; Kougka, G.; Tous, R.; Tripiana, C.; Torres, J. |
local.citation.publicationName | IEEE transactions on parallel and distributed systems |
local.citation.volume | 28 |
local.citation.number | 7 |
local.citation.startingPage | 1891 |
local.citation.endingPage | 1904 |