A cost model for random access queries in document stores

Carregant...
Miniatura
El pots comprar en digital a:
El pots comprar en paper a:

Projectes de recerca

Unitats organitzatives

Número de la revista

Títol de la revista

ISSN de la revista

Títol del volum

Col·laborador

Editor

Tribunal avaluador

Realitzat a/amb

Tipus de document

Article

Data publicació

Editor

Condicions d'accés

Accés obert

item.page.rightslicense

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització de la persona titular dels drets

Assignatures relacionades

Assignatures relacionades

Publicacions relacionades

Datasets relacionats

Datasets relacionats

Projecte CCD

Abstract

Document stores have become one of the key NoSQL storage solutions. They have been widely adopted in different domains due to their ability to store semi-structured data and expressive query capabilities. However, implementations differ in terms of concrete data storage and retrieval. Unfortunately, a standard framework for data and query optimization for document stores is nonexistent, and only implementation-specific design and query guidelines are used. Hence, the goal of this work is to aid automating the data design for document stores based on query costs instead of generic design rules. For this, we define a generic storage and query cost model based on disk access and memory allocation that allows estimating the impact of design decisions. Since all document stores carry out data operations in memory, we first estimate the memory usage by considering characteristics of the stored documents, their access patterns, and memory management algorithms. Then, using this estimation and metadata storage size, we introduce a cost model for random access queries. We validate our work on two well-known document store implementations: MongoDB and Couchbase. The results show that the memory usage estimates have the average precision of 91% and predicted costs are highly correlated to the actual execution times. During this work, we have managed to suggest several improvements to document storage systems. Thus, this cost model also contributes to identifying discordance between document store implementations and their theoretical expectations.

Descripció

Persones/entitats

Document relacionat

Versió de

Citació

Hewasinghage, M. [et al.]. A cost model for random access queries in document stores. "VLDB journal", Juliol 2021, vol. 30, núm. 4, p. 559-578.

Ajut

Forma part

Dipòsit legal

ISBN

ISSN

1066-8888

Altres identificadors

Referències