Mostra el registre d'ítem simple

dc.contributor.authorFlores Herrera, Javier de Jesús
dc.contributor.authorNadal Francesch, Sergi
dc.contributor.authorRomero Moral, Óscar
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Computació
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat Erasmus Mundus en Tecnologies de la Informació per a la Intel·ligència Empresarial
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.date.accessioned2021-04-06T10:11:42Z
dc.date.available2021-04-06T10:11:42Z
dc.date.issued2021
dc.identifier.citationFlores, J.; Nadal, S.; Romero, O. Effective and scalable data discovery with NextiaJD. A: International Conference on Extending Database Technology. "Advances in Database Technology: EDBT 2021, 24th International Conference on Extending Database Technology: Nicosia, Cyprus, March 23-26, 2021: proceedings". Konstanz: OpenProceedings, 2021, p. 690-693. ISBN 978-3-89318-084-4. DOI 10.5441/002/edbt.2021.85.
dc.identifier.isbn978-3-89318-084-4
dc.identifier.urihttp://hdl.handle.net/2117/343152
dc.description.abstractWe present NextiaJD, a data discovery system with high predictive performance and computational efficiency. NextiaJD aids data scientists in the discovery of datasets that can be crossed. To that end, it proposes a ranking of candidate pairs according to their join quality, which is based on a novel similarity measure that considers both containment and cardinality pro- portions between candidate attributes. To do so, NextiaJD adopts a learning approach relying on profiles. These are succint and informative representations of the schemata and data values of datasets that capture their underlying characteristics. NextiaJD's features are fully integrated into Apache Spark and benefits from it to parallelize the profiling and discovery processes. The on-site demonstration will showcase how NextiaJD can effectively support large-scale data discovery tasks with a large set of datasets the audience will be able to play with.
dc.description.sponsorshipThis work is partly supported by Barcelona’s City Council under grant agreement 20S08704. Javier Flores is supported by contract 2020-DI-027 of the Industrial Doctorate Program of the Government of Catalonia and Consejo Nacional de Ciencia y Tecnología (CONACYT, Mexico).
dc.format.extent4 p.
dc.language.isoeng
dc.publisherOpenProceedings
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject.lcshData sets
dc.subject.lcshBig data
dc.subject.lcshData mining
dc.titleEffective and scalable data discovery with NextiaJD
dc.typeConference lecture
dc.subject.lemacConjunts de dades
dc.subject.lemacDades massives
dc.subject.lemacMineria de dades
dc.contributor.groupUniversitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering
dc.identifier.doi10.5441/002/edbt.2021.85
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://doi.org/10.5441/002/edbt.2021.85
dc.rights.accessOpen Access
local.identifier.drac30818377
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/Ajuntament de Barcelona/20S08704
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/V PRI/2020 DI 027
local.citation.authorFlores, J.; Nadal, S.; Romero, O.
local.citation.contributorInternational Conference on Extending Database Technology
local.citation.pubplaceKonstanz
local.citation.publicationNameAdvances in Database Technology: EDBT 2021, 24th International Conference on Extending Database Technology: Nicosia, Cyprus, March 23-26, 2021: proceedings
local.citation.startingPage690
local.citation.endingPage693


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple