Performance Characterization of Spark Workloads on Shared NUMA Systems

Baig, Shuja-ur-Rehman; Amaral, Marcelo; Polo, Jordà; Carrera, David

doi:10.1109/BigDataService.2018.00015

Visualitza/Obre

Performance Characterization of Spark Workloads.pdf (256,6Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Baig, Shuja-ur-Rehman

Amaral, Marcelo

Polo, Jordà

Carrera, David

Tipus de documentComunicació de congrés

Data publicació2018-07-09

EditorIEEE

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteHi-EST - Holistic Integration of Emerging Supercomputing Technologies (EC-H2020-639595)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)

Abstract

As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, there is also a growing need to optimize them for modern processors. Spark has gained momentum over the last few years among companies looking for high performance solutions that can scale out across different cluster sizes. At the same time, modern processors can be connected to large amounts of physical memory, in the range of up to few terabytes. This opens an enormous range of opportunities for runtimes and applications that aim to improve their performance by leveraging low latencies and high bandwidth provided by RAM. The result is that there are several examples today of applications that have started pushing the in-memory computing paradigm to accelerate tasks. To deliver such a large physical memory capacity, hardware vendors have leveraged Non-Uniform Memory Architectures (NUMA). This paper explores how Spark-based workloads are impacted by the effects of NUMA-placement decisions, how different Spark configurations result in changes in delivered performance, how the characteristics of the applications can be used to predict workload collocation conflicts, and how to improve performance by collocating workloads in scale-up nodes. We explore several workloads run on top of the IBM Power8 processor, and provide manual strategies that can leverage performance improvements up to 40% on Spark workloads when using smart processor-pinning and workload collocation strategies.

CitacióBaig, S. [et al.]. Performance Characterization of Spark Workloads on Shared NUMA Systems. A: "2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService)". IEEE, 2018, p. 41-48.

URIhttp://hdl.handle.net/2117/120784

DOI10.1109/BigDataService.2018.00015

ISBN978-1-5386-5119-3

Versió de l'editorhttps://ieeexplore.ieee.org/document/8405690/

Col·leccions

Computer Sciences - Ponències/Comunicacions de congressos [574]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Performance Cha ... ion of Spark Workloads.pdf		256,6Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Performance Characterization of Spark Workloads on Shared NUMA Systems

Visualitza/Obre

Explora