Now showing items 1-11 of 11

  • A cost-based storage format selector for materialized results in big data frameworks 

    Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2019-05-08)
    Article
    Restricted access - publisher's policy
    Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of different users share many common tasks (i.e, ...
  • A framework for user-centered declarative ETL 

    Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang (2014)
    Conference report
    Open Access
    As business requirements evolve with increasing information density and velocity, there is a growing need for efficiency and automation of Extract-Transform-Load (ETL) processes. Current approaches for the modeling and ...
  • A machine learning approach for layout inference in spreadsheets 

    Koci, Elvis; Thiele, Maik; Romero Moral, Óscar; Lehner, Wolfgang (SciTePress, 2016)
    Conference report
    Open Access
    Spreadsheet applications are one of the most used tools for content generation and presentation in industry and the Web. In spite of this success, there does not exist a comprehensive approach to automatically extract and ...
  • ATUN-HL: auto tuning of hybrid layouts using workload and data characteristics 

    Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2018)
    Conference report
    Restricted access - publisher's policy
    Ad-hoc analysis implies processing data in near real-time. Thus, raw data (i.e., neither normalized nor transformed) is typically dumped into a distributed engine, where it is generally stored into a hybrid layout. Hybrid ...
  • Frequent patterns in ETL workflows: An empirical approach 

    Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang (Elsevier, 2017-09-05)
    Article
    Restricted access - publisher's policy
    The complexity of Business Intelligence activities has driven the proposal of several approaches for the effective modeling of Extract-Transform-Load (ETL) processes, based on the conceptual abstraction of their operations. ...
  • Intermediate results materialization selection and format for data-intensive flows 

    Munir, Rana Faisal; Nadal Francesch, Sergi; Romero Moral, Óscar; Abelló Gamazo, Alberto; Jovanovic, Petar; Thiele, Maik; Lehner, Wolfgang (2018-05-01)
    Article
    Restricted access - publisher's policy
    Data-intensive flows deploy a variety of complex data transformations to build information pipelines from data sources to different end users. As data are processed, these workflows generate large intermediate results, ...
  • POIESIS: A tool for quality-aware ETL process redesign 

    Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang (2015)
    Conference report
    Open Access
    We present a tool, called POIESIS, for automatic ETL process enhancement. ETL processes are essential data-centric activities in modern business intelligence environments and they need to be examined through a viewpoint ...
  • Quality measures for ETL processes: from goals to implementation 

    Theodorou, Vasileios; Abelló Gamazo, Alberto; Lehner, Wolfgang; Thiele, Maik (2016-10-01)
    Article
    Open Access
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse ...
  • Resilient store: a heuristic-based data format selector for intermediate results 

    Munir, Rana Faisal; Romero Moral, Óscar; Abelló Gamazo, Alberto; Bilalli, Besim; Thiele, Maik; Lehner, Wolfgang (2016)
    Journal
    Open Access
    Large-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which ...
  • Table identification and reconstruction in spreadsheets 

    Koci, Elvis; Thiele, Maik; Romero Moral, Óscar; Lehner, Wolfgang (Springer, 2017)
    Conference report
    Open Access
    Spreadsheets are one of the most successful content generation tools, used in almost every enterprise to perform data transformation, visualization, and analysis. The high degree of freedom provided by these tools results ...
  • Table recognition in spreadsheets via a graph representation 

    Koci, Elvis; Thiele, Maik; Lehner, Wolfgang; Romero Moral, Óscar (2018)
    Conference report
    Open Access
    Spreadsheet software are very popular data management tools. Their ease of use and abundant functionalities equip novices and professionals alike with the means to generate, transform, analyze, and visualize data. As a ...