Ponències/Comunicacions de congressos
Enviaments recents
-
Effective and scalable data discovery with NextiaJD
(OpenProceedings, 2021)
Comunicació de congrés
Accés obertWe present NextiaJD, a data discovery system with high predictive performance and computational efficiency. NextiaJD aids data scientists in the discovery of datasets that can be crossed. To that end, it proposes a ranking ... -
DocDesign 2.0: Automated database design for document stores with multi-criteria optimization
(OpenProceedings, 2021)
Comunicació de congrés
Accés obertWe present DocDesign 2.0, a novel system that supports database design for document stores. DocDesign 2.0 automatically generates a document store design driven by a query workload and a set of optimization objectives. In ... -
Towards scalable data discovery
(OpenProceedings, 2021)
Comunicació de congrés
Accés obertWe study the problem of discovering joinable datasets at scale. We approach the problem from a learning perspective relying on profiles. These are succinct representations that capture the underlying characteristics of the ... -
On the performance impact of using JSON, beyond impedance mismatch
(Springer, 2020)
Text en actes de congrés
Accés obertNOSQL database management systems adopt semi-structured data models, such as JSON, to easily accommodate schema evolution and overcome the overhead generated from transforming internal structures to tabular data (i.e., ... -
Multidimensional integration of RDF datasets
(Springer, 2019)
Comunicació de congrés
Accés obertData providers have been uploading RDF datasets on the web to aid researchers and analysts in finding insights. These datasets, made available by different data providers, contain common characteristics that enable their ... -
XLIndy: interactive recognition and information extraction in spreadsheets
(Association for Computing Machinery (ACM), 2019)
Text en actes de congrés
Accés obertOver the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit ... -
Automatically configuring parallelism for hybrid layouts
(Springer, 2019)
Comunicació de congrés
Accés obertDistributed processing frameworks process data in parallel by dividing it into multiple partitions and each partition is processed in a separate task. The number of tasks is always created based on the total file size. ... -
Keeping the data lake in form: DS-kNN datasets categorization using proximity mining
(Springer, 2019)
Text en actes de congrés
Accés obertWith the growth of the number of datasets stored in data repositories, there has been a trend of using Data Lakes (DLs) to store such data. DLs store datasets in their raw formats without any transformations or preprocessing, ... -
FAME: supporting continuous requirements elicitation by combining user feedback and monitoring
(Institute of Electrical and Electronics Engineers (IEEE), 2018)
Text en actes de congrés
Accés obertContext: Software evolution ensures that software systems in use stay up to date and provide value for end-users. However, it is challenging for requirements engineers to continuously elicit needs for systems used by ... -
ODIN: A dataspace management system
(CEUR-WS.org, 2019)
Comunicació de congrés
Accés obertODIN is a system that supports the incremental pay-as-you-go integration of data sources into dataspaces and provides user-friendly querying mechanisms on top of them. We describe its main characteristics and underlying ... -
Graph BI & analytics: current state and future challenges
(Springer, 2018)
Text en actes de congrés
Accés obertIn an increasingly competitive market, making well-informed decisions requires the analysis of a wide range of heterogeneous, large and complex data. This paper focuses on the emerging field of graph warehousing. Graphs ...