Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.660 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • IMP - Information Modeling and Processing
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • IMP - Information Modeling and Processing
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Data engineering for data science: two sides of the same coin

Thumbnail
View/Open
main.pdf (314,5Kb)
Share:
 
 
10.1007/978-3-030-59065-9\_13
 
  View Usage Statistics
Cita com:
hdl:2117/340117

Show full item record
Romero Moral, ÓscarMés informacióMés informacióMés informació
Wrembel, Robert
Document typeConference report
Defense date2020
PublisherSpringer
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
A de facto technological standard of data science is based on notebooks (e.g., Jupyter), which provide an integrated environment to execute data workflows in different languages. However, from a data engineering point of view, this approach is typically inefficient and unsafe, as most of the data science languages process data locally, i.e., in workstations with limited memory, and store data in files. Thus, this approach neglects the benefits brought by over 40 years of R&D in the area of data engineering, i.e., advanced database technologies and data management techniques. In this paper, we advocate for a standardized data engineering approach for data science and we present a layered architecture for a data processing pipeline (DPP). This architecture provides a comprehensive conceptual view of DPPs, which next enables the semi-automation of the logical and physical designs of such DPPs.
CitationRomero, O.; Wrembel, R. Data engineering for data science: two sides of the same coin. A: International Conference on Big Data Analytics and Knowledge Discovery. "Big Data Analytics and Knowledge Discovery, 22nd International Conference, DaWaK 2020: Bratislava, Slovakia, September 14-17, 2020: proceedings". Springer, 2020, p. 157-166. ISBN 978-3-030-59065-9. DOI 10.1007/978-3-030-59065-9\_13. 
URIhttp://hdl.handle.net/2117/340117
DOI10.1007/978-3-030-59065-9\_13
ISBN978-3-030-59065-9
Publisher versionhttps://link.springer.com/chapter/10.1007%2F978-3-030-59065-9_13
Collections
  • IMP - Information Modeling and Processing - Ponències/Comunicacions de congressos [89]
  • Departament d'Enginyeria de Serveis i Sistemes d'Informació - Ponències/Comunicacions de congressos [502]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
main.pdf314,5KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina