Performance Characterization of Spark Workloads on Shared NUMA Systems

Baig, Shuja-ur-Rehman; Amaral, Marcelo; Polo, Jordà; Carrera, David

doi:10.1109/BigDataService.2018.00015

dc.contributor.author	Baig, Shuja-ur-Rehman
dc.contributor.author	Amaral, Marcelo
dc.contributor.author	Polo, Jordà
dc.contributor.author	Carrera, David
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2018-09-03T13:24:44Z
dc.date.available	2018-09-03T13:24:44Z
dc.date.issued	2018-07-09
dc.identifier.citation	Baig, S. [et al.]. Performance Characterization of Spark Workloads on Shared NUMA Systems. A: "2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService)". IEEE, 2018, p. 41-48.
dc.identifier.isbn	978-1-5386-5119-3
dc.identifier.uri	http://hdl.handle.net/2117/120784
dc.description.abstract	As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, there is also a growing need to optimize them for modern processors. Spark has gained momentum over the last few years among companies looking for high performance solutions that can scale out across different cluster sizes. At the same time, modern processors can be connected to large amounts of physical memory, in the range of up to few terabytes. This opens an enormous range of opportunities for runtimes and applications that aim to improve their performance by leveraging low latencies and high bandwidth provided by RAM. The result is that there are several examples today of applications that have started pushing the in-memory computing paradigm to accelerate tasks. To deliver such a large physical memory capacity, hardware vendors have leveraged Non-Uniform Memory Architectures (NUMA). This paper explores how Spark-based workloads are impacted by the effects of NUMA-placement decisions, how different Spark configurations result in changes in delivered performance, how the characteristics of the applications can be used to predict workload collocation conflicts, and how to improve performance by collocating workloads in scale-up nodes. We explore several workloads run on top of the IBM Power8 processor, and provide manual strategies that can leverage performance improvements up to 40% on Spark workloads when using smart processor-pinning and workload collocation strategies.
dc.description.sponsorship	This work is partially supported by the European Research Council (ERC) under the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and Competitiveness (TIN2015-65316-P) and the Generalitat de Catalunya (2014-SGR-1051).
dc.format.extent	8 p.
dc.language.iso	eng
dc.publisher	IEEE
dc.subject	Àrees temàtiques de la UPC::Informàtica
dc.subject.lcsh	Memory management (Computer science)
dc.subject.lcsh	High performance computing
dc.subject.other	Performance
dc.subject.other	Modeling
dc.subject.other	Characterization
dc.subject.other	Memory
dc.subject.other	NUMA
dc.subject.other	Spark
dc.subject.other	Benchmark
dc.title	Performance Characterization of Spark Workloads on Shared NUMA Systems
dc.type	Conference lecture
dc.subject.lemac	Gestió de memòria (Informàtica)
dc.subject.lemac	Supercomputadors
dc.identifier.doi	10.1109/BigDataService.2018.00015
dc.relation.publisherversion	https://ieeexplore.ieee.org/document/8405690/
dc.rights.access	Open Access
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/639595/EU/Holistic Integration of Emerging Supercomputing Technologies/Hi-EST
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.publicationName	2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService)
local.citation.startingPage	41
local.citation.endingPage	48

Fitxers d'aquest items

Nom:: Performance Characterization of ...
Mida:: 256,6Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [574]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Performance Characterization of Spark Workloads on Shared NUMA Systems

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora