Show simple item record

dc.contributor.authorGounaris, Anastasios
dc.contributor.authorTorres Viñals, Jordi
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2017-06-29T04:53:22Z
dc.date.available2019-05-21T00:30:42Z
dc.date.issued2017-05-19
dc.identifier.citationGounaris, A., Torres, J. A methodology for Spark parameter tuning. "Big data research", Març 2018, vol. 11, p. 22-32.
dc.identifier.issn2214-5796
dc.identifier.urihttp://hdl.handle.net/2117/105965
dc.description.abstractSpark has been established as an attractive platform for big data analysis, since it manages to hide most of the complexities related to parallelism, fault tolerance and cluster setting from developers. However, this comes at the expense of having over 150 configurable parameters, the impact of which cannot be exhaustively examined due to the exponential amount of their combinations. The default values allow developers to quickly deploy their applications but leave the question as to whether performance can be improved open. In this work, we investigate the impact of the most important tunable Spark parameters with regards to shuffling, compression and serialization on the application performance through extensive experimentation using the Spark-enabled Marenostrum III (MN3) computing infrastructure of the Barcelona Supercomputing Center. The overarching aim is to guide developers on how to proceed to changes to the default values. We build upon our previous work, where we mapped our experience to a trial-and-error iterative improvement methodology for tuning parameters in arbitrary applications based on evidence from a very small number of experimental runs. The main contribution of this work is that we propose an alternative systematic methodology for parameter tuning, which can be easily applied onto any computing infrastructure and is shown to yield comparable if not better results than the initial one when applied to MN3; observed speedups in our validating test case studies start from 20%. In addition, the new methodology can rely on runs using samples instead of runs on the complete datasets, which render it significantly more practical.
dc.format.extent11 p.
dc.language.isoeng
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshBig data
dc.subject.otherSpark configuration
dc.subject.otherParameter tuning
dc.subject.otherShuffling
dc.titleA methodology for Spark parameter tuning
dc.typeArticle
dc.subject.lemacMacrodades
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1016/j.bdr.2017.05.001
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://www.sciencedirect.com/science/article/pii/S2214579617300114
dc.rights.accessOpen Access
local.identifier.drac21079949
dc.description.versionPostprint (author's final draft)
local.citation.authorGounaris, A.; Torres, J.
local.citation.publicationNameBig data research
local.citation.volume11
local.citation.startingPage22
local.citation.endingPage32


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain