The OTree: multidimensional indexing with efficient data sampling for HPC
Visualitza/Obre
10.1109/BigData47090.2019.9006121
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/180358
Tipus de documentText en actes de congrés
Data publicació2019
EditorInstitute of Electrical and Electronics Engineers (IEEE)
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
ProjecteCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
BARCELONA SUPERCOMPUTING CENTER - CENTRO. NACIONAL DE SUPERCOMPUTACION (MINECO-SEV-2015-0493)
I-BiDaaS - Industrial-Driven Big Data as a Self-Service Solution (EC-H2020-780787)
HBP SGA2 - Human Brain Project Specific Grant Agreement 2 (EC-H2020-785907)
HBP SGA1 - Human Brain Project Specific Grant Agreement 1 (EC-H2020-720270)
BARCELONA SUPERCOMPUTING CENTER - CENTRO. NACIONAL DE SUPERCOMPUTACION (MINECO-SEV-2015-0493)
I-BiDaaS - Industrial-Driven Big Data as a Self-Service Solution (EC-H2020-780787)
HBP SGA2 - Human Brain Project Specific Grant Agreement 2 (EC-H2020-785907)
HBP SGA1 - Human Brain Project Specific Grant Agreement 1 (EC-H2020-720270)
Abstract
Spatial big data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of petabytes of spatial data per year. However, many authors have pointed out that the lack of specialized frameworks for multidimensional Big Data is limiting possible applications and precluding many scientific breakthroughs. Paramount in achieving High-Performance Data Analytics is to optimize and reduce the I/O operations required to analyze large data sets. To do so, we need to organize and index the data according to its multidimensional attributes. At the same time, to enable fast and interactive exploratory analysis, it is vital to generate approximate representations of large datasets efficiently. In this paper, we propose the Outlook Tree (or OTree), a novel Multidimensional Indexing with efficient data Sampling (MIS) algorithm. The OTree enables exploratory analysis of large multidimensional datasets with arbitrary precision, a vital missing feature in current distributed data management solutions. Our algorithm reduces the indexing overhead and achieves high performance even for write-intensive HPC applications. Indeed, we use the OTree to store the scientific results of a study on the efficiency of drug inhalers. Then we compare the OTree implementation on Apache Cassandra, named Qbeast, with PostgreSQL and plain storage. Lastly, we demonstrate that our proposal delivers better performance and scalability.
CitacióCugnasco, C. [et al.]. The OTree: multidimensional indexing with efficient data sampling for HPC. A: IEEE International Conference on Big Data. "2019 IEEE International Conference on Big Data: Dec 9-Dec 12, 2019, Los Angeles, CA, USA: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 433-440.
ISBN978-1-7281-0858-2
Versió de l'editorhttps://ieeexplore.ieee.org/document/9006121
Col·leccions
- Doctorat en Arquitectura de Computadors - Ponències/Comunicacions de congressos [294]
- Doctorat en Física Computacional i Aplicada - Ponències/Comunicacions de congressos [26]
- Computer Sciences - Ponències/Comunicacions de congressos [574]
- CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [784]
- Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.955]
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
The_OTree_for_IEEE_short_paper.pdf | 259,3Kb | Visualitza/Obre |