Show simple item record

dc.contributor.authorMunir, Rana Faisal
dc.contributor.authorRomero Moral, Óscar
dc.contributor.authorAbelló Gamazo, Alberto
dc.contributor.authorBilalli, Besim
dc.contributor.authorThiele, Maik
dc.contributor.authorLehner, Wolfgang
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.identifier.citationMunir, R., Romero, O., Abello, A., Bilalli, B., Thiele, M., Lehner, W. "Resilient store: a heuristic-based data format selector for intermediate results". Almeria: 2016.
dc.descriptionThe final publication is available at
dc.description.abstractLarge-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which are typically pipelined from one operator to the following. However, if materialized, these results become reusable, hence, subsequent workflows need not recompute them. There are already many solutions that materialize intermediate results but all of them assume a fixed data format. A fixed format, however, may not be the optimal one for every situation. For example, it is well-known that different data fragmentation strategies (e.g., horizontal and vertical) behave better or worse according to the access patterns of the subsequent operations. In this paper, we present ResilientStore, which assists on selecting the most appropriate data format for materializing intermediate results. Given a workflow and a set of materialization points, it uses rule-based heuristics to choose the best storage data format based on subsequent access patterns.We have implemented ResilientStore for HDFS and three different data formats: SequenceFile, Parquet and Avro. Experimental results show that our solution gives 18% better performance than any solution based on a single fixed format.
dc.format.extent14 p.
dc.subjectÀrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject.lcshBig data
dc.subject.otherDigital storage
dc.subject.otherAccess patterns - Data format - Data fragmentation - HDFS - Intermediate results – Largescale data analysis - Rule-based heuristics - Work-flows
dc.titleResilient store: a heuristic-based data format selector for intermediate results
dc.contributor.groupUniversitat Politècnica de Catalunya. MPI - Modelització i Processament de la Informació
dc.description.peerreviewedPeer Reviewed
dc.rights.accessOpen Access
dc.description.versionPostprint (author's final draft)
upcommons.citation.authorMunir, R., Romero, O., Abello, A., Bilalli, B., Thiele, M., Lehner, W.
upcommons.citation.contributorInternational Conference on Model and Data Engineering
upcommons.citation.publicationNameModel and Data Engineering - 6th International Conference, MEDI 2016, Proceedings

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder