Resilient store: a heuristic-based data format selector for intermediate results
Visualitza/Obre
10.1007/978-3-319-45547-1_4
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/103258
Tipus de documentNúmero de revista
Data publicació2016
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
Large-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which are typically pipelined from one operator to the following. However, if materialized, these results become reusable, hence, subsequent workflows need not recompute them. There are already many solutions that materialize
intermediate results but all of them assume a fixed data format. A fixed format, however, may not be the optimal one for every situation. For example, it is well-known that different data fragmentation strategies (e.g., horizontal and
vertical) behave better or worse according to the access patterns of the subsequent operations. In this paper, we present ResilientStore, which assists on selecting the most appropriate data format for materializing intermediate
results. Given a workflow and a set of materialization points, it uses rule-based heuristics to choose the best storage data format based on subsequent access patterns.We have implemented ResilientStore for HDFS and three different
data formats: SequenceFile, Parquet and Avro. Experimental results show that our solution gives 18% better performance than any solution based on a single fixed format.
Descripció
The final publication is available at link.springer.com
CitacióMunir, R., Romero, O., Abello, A., Bilalli, B., Thiele, M., Lehner, W. "Resilient store: a heuristic-based data format selector for intermediate results". Almeria: 2016.
ISBN9783319455464
Versió de l'editorhttp://www.springer.com/gp/book/9783319455464
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
medi_paper_final_rana.pdf | 5,896Mb | Visualitza/Obre |