Now showing items 1-9 of 9

    • A cost-based storage format selector for materialized results in big data frameworks 

      Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2019-05-08)
      Article
      Open Access
      Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of different users share many common tasks (i.e, ...
    • A framework for assessing the peer review duration of journals: case study in computer science 

      Bilalli, Besim; Munir, Rana Faisal; Abelló Gamazo, Alberto (Springer Nature, 2021-01)
      Article
      Restricted access - publisher's policy
      In various fields, scientific article publication is a measure of productivity and in many occasions it is used as a critical factor for evaluating researchers. Therefore, a lot of time is dedicated to writing articles ...
    • ATUN-HL: auto tuning of hybrid layouts using workload and data characteristics 

      Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2018)
      Conference report
      Open Access
      Ad-hoc analysis implies processing data in near real-time. Thus, raw data (i.e., neither normalized nor transformed) is typically dumped into a distributed engine, where it is generally stored into a hybrid layout. Hybrid ...
    • Automatically configuring parallelism for hybrid layouts 

      Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (Springer, 2019)
      Conference lecture
      Open Access
      Distributed processing frameworks process data in parallel by dividing it into multiple partitions and each partition is processed in a separate task. The number of tasks is always created based on the total file size. ...
    • Configuring parallelism for hybrid layouts using multi-objective optimization 

      Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2020-06-01)
      Article
      Open Access
      Modern organizations typically store their data in a raw format in data lakes. These data are then processed and usually stored under hybrid layouts, because they allow projection and selection operations. Thus, they allow ...
    • Intermediate results materialization selection and format for data-intensive flows 

      Munir, Rana Faisal; Nadal Francesch, Sergi; Romero Moral, Óscar; Abelló Gamazo, Alberto; Jovanovic, Petar; Thiele, Maik; Lehner, Wolfgang (2018-05-01)
      Article
      Restricted access - publisher's policy
      Data-intensive flows deploy a variety of complex data transformations to build information pipelines from data sources to different end users. As data are processed, these workflows generate large intermediate results, ...
    • PRESISTANT : data pre-processing assistant 

      Bilalli, Besim; Abelló Gamazo, Alberto; Aluja Banet, Tomàs; Munir, Rana Faisal; Wrembel, Robert (Springer, 2019)
      Conference lecture
      Open Access
      A concrete classification algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the ...
    • Resilient store: a heuristic-based data format selector for intermediate results 

      Munir, Rana Faisal; Romero Moral, Óscar; Abelló Gamazo, Alberto; Bilalli, Besim; Thiele, Maik; Lehner, Wolfgang (2016)
      Número de revista
      Open Access
      Large-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which ...
    • Storage format selection and optimization for materialized intermediate results in data-intensive flows 

      Munir, Rana Faisal (Universitat Politècnica de Catalunya, 2019-12-05)
      Doctoral thesis
      Open Access
      Modern organizations produce and collect large volumes of data, that need to be processed repeatedly and quickly for gaining business insights. For such processing, typically, Data-intensive Flows (DIFs) are deployed on ...