Collections in this community

Recent Submissions

  • Effective and scalable data discovery with NextiaJD 

    Flores Herrera, Javier De Jesús; Nadal Francesch, Sergi; Romero Moral, Óscar (OpenProceedings, 2021)
    Conference lecture
    Open Access
    We present NextiaJD, a data discovery system with high predictive performance and computational efficiency. NextiaJD aids data scientists in the discovery of datasets that can be crossed. To that end, it proposes a ranking ...
  • DocDesign 2.0: Automated database design for document stores with multi-criteria optimization 

    Hewasinghage, Moditha Lakshan Dharmasir; Nadal Francesch, Sergi; Abelló Gamazo, Alberto (OpenProceedings, 2021)
    Conference lecture
    Open Access
    We present DocDesign 2.0, a novel system that supports database design for document stores. DocDesign 2.0 automatically generates a document store design driven by a query workload and a set of optimization objectives. In ...
  • Towards scalable data discovery 

    Flores Herrera, Javier De Jesús; Nadal Francesch, Sergi; Romero Moral, Óscar (OpenProceedings, 2021)
    Conference lecture
    Open Access
    We study the problem of discovering joinable datasets at scale. We approach the problem from a learning perspective relying on profiles. These are succinct representations that capture the underlying characteristics of the ...
  • A framework for assessing the peer review duration of journals: case study in computer science 

    Bilalli, Besim; Munir, Rana Faisal; Abelló Gamazo, Alberto (Springer Nature, 2021-01)
    Article
    Restricted access - publisher's policy
    In various fields, scientific article publication is a measure of productivity and in many occasions it is used as a critical factor for evaluating researchers. Therefore, a lot of time is dedicated to writing articles ...
  • Configuring parallelism for hybrid layouts using multi-objective optimization 

    Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (2020-06-01)
    Article
    Restricted access - publisher's policy
    Modern organizations typically store their data in a raw format in data lakes. These data are then processed and usually stored under hybrid layouts, because they allow projection and selection operations. Thus, they allow ...
  • On the performance impact of using JSON, beyond impedance mismatch 

    Hewasinghage, Moditha Lakshan Dharmasir; Nadal Francesch, Sergi; Abelló Gamazo, Alberto (Springer, 2020)
    Conference report
    Open Access
    NOSQL database management systems adopt semi-structured data models, such as JSON, to easily accommodate schema evolution and overcome the overhead generated from transforming internal structures to tabular data (i.e., ...
  • Quarry: A user-centered big data integration platform 

    Jovanovic, Petar; Nadal Francesch, Sergi; Romero Moral, Óscar; Abelló Gamazo, Alberto; Bilalli, Besim (2020-04-18)
    Article
    Restricted access - publisher's policy
    Obtaining valuable insights and actionable knowledge from data requires cross-analysis of domain data typically coming from various sources. Doing so, inevitably imposes burdensome processes of unifying different data ...
  • TopoGraph: an end-to-end framework to build and analyze graph cubes 

    Ghrab, Amine; Romero Moral, Óscar; Skhiri, Sabri; Zimányi, Esteban (2020-03-20)
    Article
    Open Access
    Graphs are a fundamental structure that provides an intuitive abstraction for modeling and analyzing complex and highly interconnected data. Given the potential complexity of such data, some approaches proposed extending ...
  • Keeping the data lake in form: proximity mining for pre-filtering schema matching 

    Al-serafi, Ayman Mounir Mohamed; Abelló Gamazo, Alberto; Romero Moral, Óscar; Calders, Toon (2020-05)
    Article
    Open Access
    Data Lakes (DLs) are large repositories of raw datasets from disparate sources. As more datasets are ingested into a DL, there is an increasing need for efficient techniques to profile them and to detect the relationships ...
  • Multidimensional integration of RDF datasets 

    Behan, Jam Jahanzeb Khan; Romero Moral, Óscar; Zimányi, Esteban (Springer, 2019)
    Conference lecture
    Open Access
    Data providers have been uploading RDF datasets on the web to aid researchers and analysts in finding insights. These datasets, made available by different data providers, contain common characteristics that enable their ...
  • XLIndy: interactive recognition and information extraction in spreadsheets 

    Koci, Elvis; Kuban, Dana; Luetting, Nico; Olwig, Dominik; Thiele, Maik; Gonsior, Julius; Lehner, Wolfgang; Romero Moral, Óscar (Association for Computing Machinery (ACM), 2019)
    Conference report
    Open Access
    Over the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit ...
  • Automatically configuring parallelism for hybrid layouts 

    Munir, Rana Faisal; Abelló Gamazo, Alberto; Romero Moral, Óscar; Thiele, Maik; Lehner, Wolfgang (Springer, 2019)
    Conference lecture
    Open Access
    Distributed processing frameworks process data in parallel by dividing it into multiple partitions and each partition is processed in a separate task. The number of tasks is always created based on the total file size. ...

View more