Show simple item record

dc.contributor.authorGiovanelli, Joseph
dc.contributor.authorBilalli, Besim
dc.contributor.authorAbelló Gamazo, Alberto
dc.contributor.otherFacultat d'Informàtica de Barcelona
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.date.accessioned2021-04-29T09:42:40Z
dc.date.available2021-04-29T09:42:40Z
dc.date.issued2021
dc.identifier.citationGiovanelli, J.; Bilalli, B.; Abelló, A. Effective data pre-processing for AutoML. A: International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data. "Proceedings of the 23rd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP): co-located with the 24th International Conference on Extending Database Technology and the 24th International Conference on Database Theory (EDBT/ICDT 2021): Nicosia, Cyprus, March 23, 2021". CEUR-WS.org, 2021, p. 1-10. ISSN 1613-0073.
dc.identifier.isbn1613-0073
dc.identifier.urihttp://hdl.handle.net/2117/344761
dc.description.abstractData pre-processing plays a key role in a data analytics process (e.g., supervised learning). It encompasses a broad range of activities that span from correcting errors to selecting the most relevant features for the analysis phase. There is no clear evidence, or rules defined, on how pre-processing transformations (e,g., normalization, discretization, etc.) impact the final results of the analysis. The problem is exacerbated when transformations are combined into pre-processing pipeline prototypes. Data scientists cannot easily foresee the impact of pipeline prototypes and hence require a method to discriminate between them and find the most relevant ones (e.g., with highest positive impact) for their study at hand. Once found, these pipelines can be optimized using AutoML in order to generate executable pipelines (i.e., with parametrized operators for each transformation). In this work, we study the impact of transformations in general, and the impact of transformations when combined together into pipelines. We develop a generic method that allows to find effective pipeline prototypes. Evaluated using Scikit-learn, our effective pipeline prototypes, when optimized, provide results that get 90% of the optimal predictive accuracy in the median, but with a cost that is 24 times smaller.
dc.description.sponsorshipThis work was supported by the GENESIS project, funded by the Spanish Ministerio de Ciencia e Innovación under project TIN2016-79269-R. We thank University of Bologna for issuing a grant for author’s research stay at Universitat Politècnica de Catalunya.
dc.format.extent10 p.
dc.language.isoeng
dc.publisherCEUR-WS.org
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject.lcshBig data
dc.subject.lcshData mining
dc.subject.lcshDecision-making
dc.titleEffective data pre-processing for AutoML
dc.typeConference report
dc.subject.lemacDades massives
dc.subject.lemacMineria de dades
dc.subject.lemacDecisió, Presa de
dc.contributor.groupUniversitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://ceur-ws.org/Vol-2840/paper1.pdf
dc.rights.accessOpen Access
local.identifier.drac31247855
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2016-79269-R
local.citation.authorGiovanelli, J.; Bilalli, B.; Abelló, A.
local.citation.contributorInternational Workshop on Design, Optimization, Languages and Analytical Processing of Big Data
local.citation.publicationNameProceedings of the 23rd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP): co-located with the 24th International Conference on Extending Database Technology and the 24th International Conference on Database Theory (EDBT/ICDT 2021): Nicosia, Cyprus, March 23, 2021
local.citation.startingPage1
local.citation.endingPage10


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 Generic
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution 4.0 Generic