Browsing by Author "Thiele, Maik"
Now showing items 1-14 of 14
-
A cost-based storage format selector for materialized results in big data frameworks
Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2019-05-08)
Article
Open AccessModern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of different users share many common tasks (i.e, ... -
A framework for user-centered declarative ETL
Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang (2014)
Conference report
Open AccessAs business requirements evolve with increasing information density and velocity, there is a growing need for efficiency and automation of Extract-Transform-Load (ETL) processes. Current approaches for the modeling and ... -
A machine learning approach for layout inference in spreadsheets
Koci, Elvis; Thiele, Maik; Romero Moral, Óscar; Lehner, Wolfgang (SciTePress, 2016)
Conference report
Open AccessSpreadsheet applications are one of the most used tools for content generation and presentation in industry and the Web. In spite of this success, there does not exist a comprehensive approach to automatically extract and ... -
ATUN-HL: auto tuning of hybrid layouts using workload and data characteristics
Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2018)
Conference report
Open AccessAd-hoc analysis implies processing data in near real-time. Thus, raw data (i.e., neither normalized nor transformed) is typically dumped into a distributed engine, where it is generally stored into a hybrid layout. Hybrid ... -
Automatically configuring parallelism for hybrid layouts
Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (Springer, 2019)
Conference lecture
Open AccessDistributed processing frameworks process data in parallel by dividing it into multiple partitions and each partition is processed in a separate task. The number of tasks is always created based on the total file size. ... -
Configuring parallelism for hybrid layouts using multi-objective optimization
Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2020-06-01)
Article
Open AccessModern organizations typically store their data in a raw format in data lakes. These data are then processed and usually stored under hybrid layouts, because they allow projection and selection operations. Thus, they allow ... -
Frequent patterns in ETL workflows: An empirical approach
Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang (Elsevier, 2017-09-05)
Article
Open AccessThe complexity of Business Intelligence activities has driven the proposal of several approaches for the effective modeling of Extract-Transform-Load (ETL) processes, based on the conceptual abstraction of their operations. ... -
Intermediate results materialization selection and format for data-intensive flows
Munir, Rana Faisal; Nadal Francesch, Sergi; Romero Moral, Óscar; Abelló Gamazo, Alberto; Jovanovic, Petar; Thiele, Maik; Lehner, Wolfgang (2018-05-01)
Article
Restricted access - publisher's policyData-intensive flows deploy a variety of complex data transformations to build information pipelines from data sources to different end users. As data are processed, these workflows generate large intermediate results, ... -
POIESIS: A tool for quality-aware ETL process redesign
Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang (2015)
Conference report
Open AccessWe present a tool, called POIESIS, for automatic ETL process enhancement. ETL processes are essential data-centric activities in modern business intelligence environments and they need to be examined through a viewpoint ... -
Quality measures for ETL processes: from goals to implementation
Theodorou, Vasileios; Abelló Gamazo, Alberto; Lehner, Wolfgang; Thiele, Maik (2016-10-01)
Article
Open AccessExtraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse ... -
Resilient store: a heuristic-based data format selector for intermediate results
Munir, Rana Faisal; Romero Moral, Óscar; Abelló Gamazo, Alberto; Bilalli, Besim; Thiele, Maik; Lehner, Wolfgang (2016)
Número de revista
Open AccessLarge-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which ... -
Table identification and reconstruction in spreadsheets
Koci, Elvis; Thiele, Maik; Romero Moral, Óscar; Lehner, Wolfgang (Springer, 2017)
Conference report
Open AccessSpreadsheets are one of the most successful content generation tools, used in almost every enterprise to perform data transformation, visualization, and analysis. The high degree of freedom provided by these tools results ... -
Table recognition in spreadsheets via a graph representation
Koci, Elvis; Thiele, Maik; Lehner, Wolfgang; Romero Moral, Óscar (2018)
Conference report
Open AccessSpreadsheet software are very popular data management tools. Their ease of use and abundant functionalities equip novices and professionals alike with the means to generate, transform, analyze, and visualize data. As a ... -
XLIndy: interactive recognition and information extraction in spreadsheets
Koci, Elvis; Kuban, Dana; Luetting, Nico; Olwig, Dominik; Thiele, Maik; Gonsior, Julius; Lehner, Wolfgang; Romero Moral, Óscar (Association for Computing Machinery (ACM), 2019)
Conference report
Open AccessOver the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit ...