Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.781 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • MPI - Modelització i processament de la Informació
  • Articles de revista
  • View Item
  •   DSpace Home
  • E-prints
  • Grups de recerca
  • MPI - Modelització i processament de la Informació
  • Articles de revista
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Frequent patterns in ETL workflows: An empirical approach

Thumbnail
View/Open
DATAK_2016_201_Revision+1_V0.pdf (1012,Kb)
Share:
 
 
10.1016/j.datak.2017.08.004
 
  View Usage Statistics
Cita com:
hdl:2117/110172

Show full item record
Theodorou, Vasileios
Abelló Gamazo, AlbertoMés informacióMés informacióMés informació
Thiele, Maik
Lehner, Wolfgang
Document typeArticle
Defense date2017-09-05
PublisherElsevier
Rights accessOpen Access
Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain
Abstract
The complexity of Business Intelligence activities has driven the proposal of several approaches for the effective modeling of Extract-Transform-Load (ETL) processes, based on the conceptual abstraction of their operations. Apart from fostering automation and maintainability, such modeling also provides the building blocks to identify and represent frequently recurring patterns. Despite some existing work on classifying ETL components and functionality archetypes, the issue of systematically mining such patterns and their connection to quality attributes such as performance has not yet been addressed. In this work, we propose a methodology for the identification of ETL structural patterns. We logically model the ETL workflows using labeled graphs and employ graph algorithms to identify candidate patterns and to recognize them on different workflows. We showcase our approach through a use case that is applied on implemented ETL processes from the TPC-DI specification and we present mined ETL patterns. Decomposing ETL processes to identified patterns, our approach provides a stepping stone for the automatic translation of ETL logical models to their conceptual representation and to generate fine-grained cost models at the granularity level of patterns.
CitationTheodorou, V., Abelló, A., Thiele, M., Lehner, W. Frequent patterns in ETL workflows: An empirical approach. "Data and knowledge engineering", Novembre 2017, vol. 112, p. 1-16. 
URIhttp://hdl.handle.net/2117/110172
DOI10.1016/j.datak.2017.08.004
ISSN0169-023X
Publisher versionhttp://www.sciencedirect.com/science/article/pii/S0169023X16302713
Collections
  • MPI - Modelització i processament de la Informació - Articles de revista [48]
  • inSSIDE - integrated Software, Service, Information and Data Engineering - Articles de revista [113]
  • Departament d'Enginyeria de Serveis i Sistemes d'Informació - Articles de revista [198]
  • GESSI - Grup d'Enginyeria del Software i dels Serveis - Articles de revista [56]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
DATAK_2016_201_Revision+1_V0.pdf1012,KbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina