Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

58.848 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament d'Arquitectura de Computadors
  • Articles de revista
  • View Item
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament d'Arquitectura de Computadors
  • Articles de revista
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Dynamic configuration of partitioning in spark applications

Thumbnail
View/Open
repartitioning_tpds_rev_final.pdf (1,767Mb)
Share:
 
 
10.1109/TPDS.2017.2647939
 
  View Usage Statistics
Cita com:
hdl:2117/106303

Show full item record
Gounaris, Anastasios
Kougka, Georgia
Tous Liesa, RubénMés informacióMés informacióMés informació
Tripiana, Carlos
Torres Viñals, JordiMés informacióMés informacióMés informació
Document typeArticle
Defense date2017-07-01
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Spark has become one of the main options for large-scale analytics running on top of shared-nothing clusters. This work aims to make a deep dive into the parallelism configuration and shed light on the behavior of parallel spark jobs. It is motivated by the fact that running a Spark application on all the available processors does not necessarily imply lower running time, while may entail waste of resources. We first propose analytical models for expressing the running time as a function of the number of machines employed. We then take another step, namely to present novel algorithms for configuring dynamic partitioning with a view to minimizing resource consumption without sacrificing running time beyond a user-defined limit. The problem we target is NP-hard. To tackle it, we propose a greedy approach after introducing the notions of dependency graphs and of the benefit from modifying the degree of partitioning at a stage; complementarily, we investigate a randomized approach. Our polynomial solutions are capable of judiciously use the resources that are potentially at user's disposal and strike interesting trade-offs between running time and resource consumption. Their efficiency is thoroughly investigated through experiments based on real execution data.
CitationGounaris, A., Kougka, G., Tous, R., Tripiana, C., Torres, J. Dynamic configuration of partitioning in spark applications. "IEEE transactions on parallel and distributed systems", 1 Juliol 2017, vol. 28, núm. 7, p. 1891-1904. 
URIhttp://hdl.handle.net/2117/106303
DOI10.1109/TPDS.2017.2647939
ISSN1045-9219
Publisher versionhttp://ieeexplore.ieee.org/document/7807262/
Collections
  • Departament d'Arquitectura de Computadors - Articles de revista [957]
  • CAP - Grup de Computació d'Altes Prestacions - Articles de revista [380]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
repartitioning_tpds_rev_final.pdf1,767MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Inici de la pàgina