Now showing items 1-13 of 13

    • A cost-based storage format selector for materialized results in big data frameworks 

      Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2019-05-08)
      Article
      Open Access
      Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of different users share many common tasks (i.e, ...
    • A framework for user-centered declarative ETL 

      Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang (2014)
      Conference report
      Open Access
      As business requirements evolve with increasing information density and velocity, there is a growing need for efficiency and automation of Extract-Transform-Load (ETL) processes. Current approaches for the modeling and ...
    • A machine learning approach for layout inference in spreadsheets 

      Koci, Elvis; Thiele, Maik; Romero Moral, Óscar; Lehner, Wolfgang (SciTePress, 2016)
      Conference report
      Open Access
      Spreadsheet applications are one of the most used tools for content generation and presentation in industry and the Web. In spite of this success, there does not exist a comprehensive approach to automatically extract and ...
    • ATUN-HL: auto tuning of hybrid layouts using workload and data characteristics 

      Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2018)
      Conference report
      Open Access
      Ad-hoc analysis implies processing data in near real-time. Thus, raw data (i.e., neither normalized nor transformed) is typically dumped into a distributed engine, where it is generally stored into a hybrid layout. Hybrid ...
    • Automatically configuring parallelism for hybrid layouts 

      Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (Springer, 2019)
      Conference lecture
      Open Access
      Distributed processing frameworks process data in parallel by dividing it into multiple partitions and each partition is processed in a separate task. The number of tasks is always created based on the total file size. ...
    • Frequent patterns in ETL workflows: An empirical approach 

      Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang (Elsevier, 2017-09-05)
      Article
      Open Access
      The complexity of Business Intelligence activities has driven the proposal of several approaches for the effective modeling of Extract-Transform-Load (ETL) processes, based on the conceptual abstraction of their operations. ...
    • Intermediate results materialization selection and format for data-intensive flows 

      Munir, Rana Faisal; Nadal Francesch, Sergi; Romero Moral, Óscar; Abelló Gamazo, Alberto; Jovanovic, Petar; Thiele, Maik; Lehner, Wolfgang (2018-05-01)
      Article
      Restricted access - publisher's policy
      Data-intensive flows deploy a variety of complex data transformations to build information pipelines from data sources to different end users. As data are processed, these workflows generate large intermediate results, ...
    • POIESIS: A tool for quality-aware ETL process redesign 

      Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang (2015)
      Conference report
      Open Access
      We present a tool, called POIESIS, for automatic ETL process enhancement. ETL processes are essential data-centric activities in modern business intelligence environments and they need to be examined through a viewpoint ...
    • Quality measures for ETL processes: from goals to implementation 

      Theodorou, Vasileios; Abelló Gamazo, Alberto; Lehner, Wolfgang; Thiele, Maik (2016-10-01)
      Article
      Open Access
      Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse ...
    • Resilient store: a heuristic-based data format selector for intermediate results 

      Munir, Rana Faisal; Romero Moral, Óscar; Abelló Gamazo, Alberto; Bilalli, Besim; Thiele, Maik; Lehner, Wolfgang (2016)
      Número de revista
      Open Access
      Large-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which ...
    • Table identification and reconstruction in spreadsheets 

      Koci, Elvis; Thiele, Maik; Romero Moral, Óscar; Lehner, Wolfgang (Springer, 2017)
      Conference report
      Open Access
      Spreadsheets are one of the most successful content generation tools, used in almost every enterprise to perform data transformation, visualization, and analysis. The high degree of freedom provided by these tools results ...
    • Table recognition in spreadsheets via a graph representation 

      Koci, Elvis; Thiele, Maik; Lehner, Wolfgang; Romero Moral, Óscar (2018)
      Conference report
      Open Access
      Spreadsheet software are very popular data management tools. Their ease of use and abundant functionalities equip novices and professionals alike with the means to generate, transform, analyze, and visualize data. As a ...
    • XLIndy: interactive recognition and information extraction in spreadsheets 

      Koci, Elvis; Kuban, Dana; Luetting, Nico; Olwig, Dominik; Thiele, Maik; Gonsior, Julius; Lehner, Wolfgang; Romero Moral, Óscar (Association for Computing Machinery (ACM), 2019)
      Conference report
      Open Access
      Over the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit ...