Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
69.147 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament d'Enginyeria de Serveis i Sistemes d'Informació
  • Journal
  • View Item
  •   DSpace Home
  • E-prints
  • Departaments
  • Departament d'Enginyeria de Serveis i Sistemes d'Informació
  • Journal
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Resilient store: a heuristic-based data format selector for intermediate results

Thumbnail
View/Open
medi_paper_final_rana.pdf (5,896Mb)
 
10.1007/978-3-319-45547-1_4
 
  View UPCommons Usage Statistics
  LA Referencia / Recolecta stats
Includes usage data since 2022
Cita com:
hdl:2117/103258

Show full item record
Munir, Rana FaisalMés informació
Romero Moral, ÓscarMés informacióMés informacióMés informació
Abelló Gamazo, AlbertoMés informacióMés informacióMés informació
Bilalli, BesimMés informacióMés informacióMés informació
Thiele, Maik
Lehner, Wolfgang
Document typeNúmero de revista
Defense date2016
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Large-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which are typically pipelined from one operator to the following. However, if materialized, these results become reusable, hence, subsequent workflows need not recompute them. There are already many solutions that materialize intermediate results but all of them assume a fixed data format. A fixed format, however, may not be the optimal one for every situation. For example, it is well-known that different data fragmentation strategies (e.g., horizontal and vertical) behave better or worse according to the access patterns of the subsequent operations. In this paper, we present ResilientStore, which assists on selecting the most appropriate data format for materializing intermediate results. Given a workflow and a set of materialization points, it uses rule-based heuristics to choose the best storage data format based on subsequent access patterns.We have implemented ResilientStore for HDFS and three different data formats: SequenceFile, Parquet and Avro. Experimental results show that our solution gives 18% better performance than any solution based on a single fixed format.
Description
The final publication is available at link.springer.com
CitationMunir, R., Romero, O., Abello, A., Bilalli, B., Thiele, M., Lehner, W. "Resilient store: a heuristic-based data format selector for intermediate results". Almeria: 2016. 
URIhttp://hdl.handle.net/2117/103258
DOI10.1007/978-3-319-45547-1_4
ISBN9783319455464
Publisher versionhttp://www.springer.com/gp/book/9783319455464
Collections
  • Departament d'Enginyeria de Serveis i Sistemes d'Informació - Journal [1]
  • MPI - Modelització i processament de la Informació - Journal [1]
  View UPCommons Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
medi_paper_final_rana.pdf5,896MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Metadata under:Metadata under CC0
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina