SETA: A suite-independent analytical framework
Document typeMaster thesis
Rights accessOpen Access
Nowadays, business analytical users need agile processes spanning from the selection of relevant data from raw data sources to the generation of data structures prepared to serve as input for OLAP, Data Mining and/or other analytical tools. However, the wide range of analytical needs and the increasingly need of adaptive Business strategies discourages the use of the ’All-In-One’ existing suites (i.e., end-to-end Solutions from a single vendor). Oppositely, an agile approach suiteindependent is advisable to boost user’s independence from a specific vendor and the analytical capabilities enabled by combining several suites / tools according to the user’s needs. In this thesis we present and develop ’SETA’, a suite-independent agile analytical framework by proposing a novel approach combining rich metadata definition and automation components. As proof of validity, we instantiate the developed framework in a real-world project for the WHO Chagas Programme. This thesis introduces two main contributions. First, an approach to store and integrate a set of heterogeneous data sources into a flexible data store in some intermediate point between the classical Data Warehouse (DW) approaches and the recent Data Lake strategies. We argue that classical DW systems are too rigid to accommodate agile analytical pipelines, whereas Data Lakes and Big Data technologies are not suitable to much of today’s organizations. Thus, a novel approach combining both approaches is presented. Second, a rich definitional system to represent 1) the data components at Source, Global Schema and Domain levels, 2) the data mappings between this levels and 3) the final user analytical requirements. This definitional system provides a flexible view of the data schema at different levels and habilitates the automation of the target data schemas and the ETL to feed them.