Ponències/Comunicacions de congressos
Recent Submissions
-
Effective and scalable data discovery with NextiaJD
(OpenProceedings, 2021)
Conference lecture
Open AccessWe present NextiaJD, a data discovery system with high predictive performance and computational efficiency. NextiaJD aids data scientists in the discovery of datasets that can be crossed. To that end, it proposes a ranking ... -
DocDesign 2.0: Automated database design for document stores with multi-criteria optimization
(OpenProceedings, 2021)
Conference lecture
Open AccessWe present DocDesign 2.0, a novel system that supports database design for document stores. DocDesign 2.0 automatically generates a document store design driven by a query workload and a set of optimization objectives. In ... -
Towards scalable data discovery
(OpenProceedings, 2021)
Conference lecture
Open AccessWe study the problem of discovering joinable datasets at scale. We approach the problem from a learning perspective relying on profiles. These are succinct representations that capture the underlying characteristics of the ... -
On the performance impact of using JSON, beyond impedance mismatch
(Springer, 2020)
Conference report
Open AccessNOSQL database management systems adopt semi-structured data models, such as JSON, to easily accommodate schema evolution and overcome the overhead generated from transforming internal structures to tabular data (i.e., ... -
Multidimensional integration of RDF datasets
(Springer, 2019)
Conference lecture
Open AccessData providers have been uploading RDF datasets on the web to aid researchers and analysts in finding insights. These datasets, made available by different data providers, contain common characteristics that enable their ... -
XLIndy: interactive recognition and information extraction in spreadsheets
(Association for Computing Machinery (ACM), 2019)
Conference report
Open AccessOver the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit ... -
Automatically configuring parallelism for hybrid layouts
(Springer, 2019)
Conference lecture
Open AccessDistributed processing frameworks process data in parallel by dividing it into multiple partitions and each partition is processed in a separate task. The number of tasks is always created based on the total file size. ... -
Keeping the data lake in form: DS-kNN datasets categorization using proximity mining
(Springer, 2019)
Conference report
Open AccessWith the growth of the number of datasets stored in data repositories, there has been a trend of using Data Lakes (DLs) to store such data. DLs store datasets in their raw formats without any transformations or preprocessing, ... -
FAME: supporting continuous requirements elicitation by combining user feedback and monitoring
(Institute of Electrical and Electronics Engineers (IEEE), 2018)
Conference report
Open AccessContext: Software evolution ensures that software systems in use stay up to date and provide value for end-users. However, it is challenging for requirements engineers to continuously elicit needs for systems used by ... -
ODIN: A dataspace management system
(CEUR-WS.org, 2019)
Conference lecture
Open AccessODIN is a system that supports the incremental pay-as-you-go integration of data sources into dataspaces and provides user-friendly querying mechanisms on top of them. We describe its main characteristics and underlying ... -
Graph BI & analytics: current state and future challenges
(Springer, 2018)
Conference report
Open AccessIn an increasingly competitive market, making well-informed decisions requires the analysis of a wide range of heterogeneous, large and complex data. This paper focuses on the emerging field of graph warehousing. Graphs ...