MDM: governing evolution in big data ecosystems
Document typeConference lecture
Rights accessOpen Access
European Commisision's projectEC-H2020-644018-SUPERSEDE-
On-demand integration of multiple data sources is a critical requirement in many Big Data settings. This has been coined as the data variety challenge, which refers to the complexity of dealing with an heterogeneous set of data sources to enable their integrated analysis. In Big Data settings, data sources are commonly represented by external REST APIs, which provide data in their original format and continously apply changes in their structure (i.e., schema). Thus, data analysts face the challenge to integrate such multiple sources, and then continuosly adapt their analytical processes to changes in the schema. To address this challenges, in this paper, we present the Metadata Management System, shortly MDM, a tool that supports data stewards and analysts to manage the integration and analysis of multiple heterogeneous sources under schema evolution. MDM adopts a vocabulary-based integration-oriented ontology to conceptualize the domain of interest and relies on local-as-view mappings to link it with the sources. MDM provides user-friendly mechanisms to manage the ontology and mappings. Finally, a query rewriting algorithm ensures that queries posed to the ontology are correctly resolved to the sources in the presence of multiple schema versions, a transparent process to data analysts. On-site, we will showcase using real-world examples how MDM facilitates the management of multiple evolving data sources and enables its integrated analysis.
CitationNadal, S. [et al.]. MDM: governing evolution in big data ecosystems. A: International Conference on Extending Database Technology. "Advances in Database Technology, EDBT 2018, 21st International Conference on Extending Database Technology: Vienna, Austria, March 26–29, 2018: proceedings". Konstanz: OpenProceedings, 2018, p. 682-685.