Automatic query driven data modelling in cassandra
Document typeConference report
Rights accessOpen Access
Non-relational databases have recently been the preferred choice when it comes to dealing with BigData challenges, but their performance is very sensitive to the chosen data organisations. We have seen differences of over 70 times in response time for the same query on different models. This brings users the need to be fully conscious of the queries they intend to serve in order to design their data model. The common practice then, is to replicate data into different models designed to fit different query requirements. In this scenario, the user is in charge of the code implementation required to keep consistency between the different data replicas. We propose and design a mechanism and a prototype to provide users with transparent management, where queries are matched with a well-performing model option. Additionally, we propose to do so by transforming the replication mechanism into a heterogeneous replication one, in order to avoid squandering disk space while keeping the availability and reliability features.
CitationHernández, R., Becerra, Y., Torres, J., Ayguadé, E. Automatic query driven data modelling in cassandra. A: International Conference on Computational Science. "Procedia Computer Science (Vol. 51, 2015)". Reykjavík: Elsevier, 2015, p. 2822-2826.