Towards a comprehensive Data LifeCycle model for big data environments

Document typeConference report
Defense date2017
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
A huge amount of data is constantly being produced in the world. Data coming from the IoT, from scientific simulations, or from any other field of the eScience, are accumulated over historical data sets and set up the seed for future Big Data processing, with the final goal to generate added value and discover knowledge. In such computing processes, data are the main resource, however, organizing and managing data during their entire life cycle becomes a complex research topic. As part of this, Data LifeCycle (DLC) models have been proposed to efficiently organize large and complex data sets, from creation to consumption, in any field, and any scale, for an effective data usage and big data exploitation. 2. Several DLC frameworks can be found in the literature, each one defined for specific environments and scenarios. However, we realized that there is no global and comprehensive DLC model to be easily adapted to different scientific areas. For this reason, in this paper we describe the Comprehensive Scenario Agnostic Data LifeCycle (COSA-DLC) model, a DLC model which: i) is proved to be comprehensive as it addresses the 6Vs challenges (namely Value, Volume, Variety, Velocity, Variability and Veracity, and ii), it can be easily adapted to any particular scenario and, therefore, fit the requirements of a specific scientific field. In this paper we also include two use cases to illustrate the ease of the adaptation in different scenarios. We conclude that the comprehensive scenario agnostic DLC model provides several advantages, such as facilitating global data management, organization and integration, easing the adaptation to any kind of scenario, guaranteeing good data quality levels and, therefore, saving design time and efforts for the scientific and industrial communities.
CitationSinaeepourfard, A., García, J., Masip, X., Marín, E. Towards a comprehensive Data LifeCycle model for big data environments. A: IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. "3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2016: 6-9 December, 2016, Tongji University, Shanghai, China: proceedings". Shangai: Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 100-106.
ISBN978-1-4503-4617-7
Publisher versionhttp://ieeexplore.ieee.org/document/7877056/
Files | Description | Size | Format | View |
---|---|---|---|---|
Towards+a+Compr ... el+for+BD+Environments.pdf | 554,7Kb | View/Open |