Data engineering for data science: two sides of the same coin
Visualitza/Obre
10.1007/978-3-030-59065-9\_13
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/340117
Tipus de documentText en actes de congrés
Data publicació2020
EditorSpringer
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
A de facto technological standard of data science is based on notebooks (e.g., Jupyter), which provide an integrated environment to execute data workflows in different languages. However, from a data engineering point of view, this approach is typically inefficient and unsafe, as most of the data science languages process data locally, i.e., in workstations with limited memory, and store data in files. Thus, this approach neglects the benefits brought by over 40 years of R&D in the area of data engineering, i.e., advanced database technologies and data management techniques. In this paper, we advocate for a standardized data engineering approach for data science and we present a layered architecture for a data processing pipeline (DPP). This architecture provides a comprehensive conceptual view of DPPs, which next enables the semi-automation of the logical and physical designs of such DPPs.
CitacióRomero, O.; Wrembel, R. Data engineering for data science: two sides of the same coin. A: International Conference on Big Data Analytics and Knowledge Discovery. "Big Data Analytics and Knowledge Discovery, 22nd International Conference, DaWaK 2020: Bratislava, Slovakia, September 14-17, 2020: proceedings". Springer, 2020, p. 157-166. ISBN 978-3-030-59065-9. DOI 10.1007/978-3-030-59065-9\_13.
ISBN978-3-030-59065-9
Versió de l'editorhttps://link.springer.com/chapter/10.1007%2F978-3-030-59065-9_13
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
main.pdf | 314,5Kb | Visualitza/Obre |