Tuning small analytics on Big Data: Data partitioning and secondary indexes in the Hadoop ecosystem
Tipus de documentArticle
Condicions d'accésAccés restringit per política de l'editorial (embargat fins 2017-12-31)
In the recent years the problems of using generic storage (i.e., relational) techniques for very specific applications have been detected and outlined and, as a consequence, some alternatives to Relational DBMSs (e.g., HBase) have bloomed. Most of these alternatives sit on the cloud and benefit from cloud computing, which is nowadays a reality that helps us to save money by eliminating the hardware as well as software fixed costs and just pay per use. On top of this, specific querying frameworks to exploit the brute force in the cloud (e.g., MapReduce) have also been devised. The question arising next tries to clear out if this (rather naive) exploitation of the cloud is an alternative to tuning DBMSs or it still makes sense to consider other options when retrieving data from these settings.; In this paper, we study the feasibility of solving OLAP queries with Hadoop (the Apache project implementing MapReduce) while benefiting from secondary indexes and partitioning in HBase. Our main contribution is the comparison of different access plans and the definition of criteria (i.e., cost estimation) to choose among them in terms of consumed resources (namely CPU, bandwidth and I/O).
CitacióRomero, O., Herrero, V., Abelló, A., Ferrarons, Jaume. Tuning small analytics on Big Data: Data partitioning and secondary indexes in the Hadoop ecosystem. "Information systems", Desembre 2015, vol. 54, p. 336-356.
Versió de l'editorhttp://www.sciencedirect.com/science/article/pii/S0306437914001458