Micro-architectural characterization of Apache Spark on batch and stream processing workloads

Awan, Ahsan; Brorsson, Mats; Vlassov, Vladimir; Ayguadé Parra, Eduard

doi:10.1109/BDCloud-SocialCom-SustainCom.2016.20

Visualitza/Obre

Micro-architectural Characterization of Apache.pdf (593,3Kb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Awan, Ahsan

Brorsson, Mats

Vlassov, Vladimir

Ayguadé Parra, Eduard

Tipus de documentText en actes de congrés

Data publicació2016

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés restringit per política de l'editorial

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
BARCELONA SUPERCOMPUTING CENTER - CENTRO. NACIONAL DE SUPERCOMPUTACION (MINECO-SEV-2015-0493)

Abstract

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We compare the micro-architectural performance of batch processing and stream processing workloads in Apache Spark using hardware performance counters on a dual socket server. In our evaluation experiments, we have found that batch processing and stream processing has same micro-architectural behavior in Spark if the difference between two implementations is of micro-batching only. If the input data rates are small, stream processing workloads are front-end bound. However, the front end bound stalls are reduced at larger input data rates and instruction retirement is improved. Moreover, Spark workloads using DataFrames have improved instruction retirement over workloads using RDDs.

CitacióAwan, A., Brorsson, M., Vlassov, V., Ayguade, E. Micro-architectural characterization of Apache Spark on batch and stream processing workloads. A: International Conference on Big Data and Cloud Computing. "2015 IEEE Fifth International Conference on Big Data and Cloud Computing (BDCloud 2015): Dalian, China: 26-28 August 2015". Dailan: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 59-66.

URIhttp://hdl.handle.net/2117/99707

DOI10.1109/BDCloud-SocialCom-SustainCom.2016.20

ISBN9781467371841

Versió de l'editorhttp://ieeexplore.ieee.org/document/7723674/

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Micro-architectural Characterization of Apache.pdf		593,3Kb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

Micro-architectural characterization of Apache Spark on batch and stream processing workloads

Visualitza/Obre

Explora