Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

58.848 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Characterizing BigBench Queries, Hive, and Spark in Multi-cloud Environments

Thumbnail
View/Open
Characterizing TPCx-BB queries,.pdf (828,2Kb)
Share:
 
 
10.1007/978-3-319-72401-0_5
 
  View Usage Statistics
Cita com:
hdl:2117/114812

Show full item record
Poggi, Nicolas
Montero, Alejandro
Carrera, David
Document typeConference lecture
Defense date2017-12-30
PublisherSpringer Verlag
Rights accessOpen Access
Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain
ProjectHi-EST - Holistic Integration of Emerging Supercomputing Technologies (EC-H2020-639595)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
Abstract
BigBench is the new standard (TPCx-BB) for benchmarking and testing Big Data systems. The TPCx-BB specification describes several business use cases—queries—which require a broad combination of data extraction techniques including SQL, Map/Reduce (M/R), user code (UDF), and Machine Learning to fulfill them. However, currently, there is no widespread knowledge of the different resource requirements and expected performance of each query, as is the case to more established benchmarks. Moreover, over the last year, the Spark framework and APIs have been evolving very rapidly, with major improvements in performance and the stable release of v2. It is our intent to compare the current state of Spark to Hive’s base implementation which can use the legacy M/R engine and Mahout or the current Tez and MLlib frameworks. At the same time, cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Hive and Spark come ready to use, with a general-purpose configuration and upgrade management. The study characterizes both the BigBench queries and the out-of-the-box performance of Spark and Hive versions in the cloud. At the same time, comparing popular PaaS offerings in terms of reliability, data scalability (1 GB to 10 TB), versions, and settings from Azure HDinsight, Amazon Web Services EMR, and Google Cloud Dataproc. The query characterization highlights the similarities and differences in Hive an Spark frameworks, and which queries are the most resource consuming according to CPU, memory, and I/O. Scalability results show how there is a need for configuration tuning in most cloud providers as data scale grows, especially with Sparks memory usage. These results can help practitioners to quickly test systems by picking a subset of the queries which stresses each of the categories. At the same time, results show how Hive and Spark compare and what performance can be expected of each in PaaS.
CitationPoggi, N.; Montero, A.; Carrera, D. Characterizing BigBench Queries, Hive, and Spark in Multi-cloud Environments. A: "TPCTC 2017: Performance Evaluation and Benchmarking for the Analytics Era. Lecture Notes in Computer Science". Springer Verlag, 2017, p. 55-74. 
URIhttp://hdl.handle.net/2117/114812
DOI10.1007/978-3-319-72401-0_5
ISBN978-3-319-72400-3
Publisher versionhttps://link.springer.com/chapter/10.1007/978-3-319-72401-0_5
Collections
  • Computer Sciences - Ponències/Comunicacions de congressos [489]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
Characterizing TPCx-BB queries,.pdf828,2KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Inici de la pàgina