Automatic query driven data modelling in Cassandra
Document typeConference report
PublisherBarcelona Supercomputing Center
Rights accessOpen Access
Non-relational databases have recently been the preferred choice when it comes to dealing with Big Data challenges, but their performance is very sensitive to the chosen data organisations. We have seen differences of over 70 times in response time for the same query on different models. This brings users the need to be fully conscious of the queries they intend to serve in order to design their data model. The common practice then, is to replicate data into different models designed to fit different query requirements. In this scenario, the user is in charge of the code implementation required to keep consistency between the different data replicas. Manually replicating data in such high layers of the database results in a lot of squandered storage due to the underlying system replication mechanisms that are formerly designed for availability and reliability ends. We propose and design a mechanism and a prototype to provide users with transparent management, where queries are matched with a well-performing model option. Additionally, we propose to do so by transforming the replication mechanism into a heterogeneous replication one, in order to avoid squandering disk space while keeping the availability and reliability features. The result is a system where, regardless of the query or model the user specifies, response time will always be that of an affine query.
CitationHernandez, Roger [et al.]. Automatic query driven data modelling in Cassandra. A: "BSC Doctoral Symposium (2nd: 2015: Barcelona)". 2nd ed. Barcelona: Barcelona Supercomputing Center, 2015, p. 114.