Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.702 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

ALOJA-ML: a framework for automating characterization and knowledge discovery in Hadoop deployments

Thumbnail
View/Open
Article principal (622,9Kb)
Share:
 
 
10.1145/2783258.2788600
 
  View Usage Statistics
Cita com:
hdl:2117/77791

Show full item record
Berral García, Josep LluísMés informacióMés informacióMés informació
Poggi, Nicolas
Carrera Pérez, DavidMés informació
Call, AaronMés informació
Reinauer, Rob
Green, Daron
Document typeConference report
Defense date2015
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
ProjectHi-EST - Holistic Integration of Emerging Supercomputing Technologies (EC-H2020-639595)
Abstract
This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results. The ALOJA-ML project is the latest phase of a long-term collaboration between BSC and Microsoft, to automate the characterization of cost-effectiveness on Big Data deployments, focusing on Hadoop. Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices. Recently the ALOJA project presented an open, vendor-neutral repository, featuring over 16.000 Hadoop executions. These results are accompanied by a test bed and tools to deploy and evaluate the cost-effectiveness of the different hardware configurations, parameter tunings, and Cloud services. Despite early success within ALOJA from expert-guided benchmarking, it became clear that a genuinely comprehensive study requires automation of modeling procedures to allow a systematic analysis of large and resource-constrained search spaces. ALOJA-ML provides such an automated system allowing knowledge discovery by modeling Hadoop executions from observed benchmarks across a broad set of configuration parameters. The resulting empirically-derived performance models can be used to forecast execution behavior of various workloads; they allow a-priori prediction of the execution times for new configurations and HW choices and they offer a route to model-based anomaly detection. In addition, these models can guide the benchmarking exploration efficiently, by automatically prioritizing candidate future benchmark tests. Insights from ALOJA-ML's models can be used to reduce the operational time on clusters, speed-up the data acquisition and knowledge discovery process, and importantly, reduce running costs. In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big Data applications.
CitationBerral, J., Poggi, N., Carrera, D., Call, A., Reinauer, R., Green, D. ALOJA-ML: a framework for automating characterization and knowledge discovery in Hadoop deployments. A: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. "KDD '15 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: August 10-13, 2015: Sydney, NSW, Australia". Sydney: Association for Computing Machinery (ACM), 2015, p. 1701-1710. 
URIhttp://hdl.handle.net/2117/77791
DOI10.1145/2783258.2788600
ISBN978-1-4503-3664-2
Publisher versionhttp://dl.acm.org/citation.cfm?id=2788600
Collections
  • Computer Sciences - Ponències/Comunicacions de congressos [500]
  • CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [782]
  • Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.847]
  • LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge - Ponències/Comunicacions de congressos [120]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
sigkdd15.pdfArticle principal622,9KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina