PRESISTANT : data pre-processing assistant

View/Open
Cita com:
hdl:2117/127984
Document typeConference lecture
Defense date2019
PublisherSpringer
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
A concrete classification algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. Typically, in order to improve the results, datasets need to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and non-experienced users become overwhelmed. Trial and error is not feasible in the presence of big amounts of data. We developed a method and tool—PRESISTANT, with the aim of answering the need for user assistance during data pre-processing. Leveraging ideas from meta-learning, PRESISTANT is capable of assisting the user by recommending pre-processing operators that ultimately improve the classification performance. The user selects a classification algorithm, from the ones considered, and then PRESISTANT proposes candidate transformations to improve the result of the analysis. In the demonstration, participants will experience, at first hand, how PRESISTANT easily and effectively ranks the pre-processing operators.
CitationBilalli, B. [et al.]. PRESISTANT : data pre-processing assistant. A: International Conference on Advanced Information Systems Engineering. "Information Systems in the Big Data Era: CAiSE Forum 2018, Tallinn, Estonia, June 11-15, 2018: proceedings". Berlín: Springer, 2019, p. 57-65.
ISBN978-3-319-92900-2
Publisher versionhttps://link.springer.com/chapter/10.1007%2F978-3-319-92901-9_6
Collections
- GESSI - Grup d'Enginyeria del Software i dels Serveis - Ponències/Comunicacions de congressos [197]
- inSSIDE - integrated Software, Service, Information and Data Engineering - Ponències/Comunicacions de congressos [332]
- LIAM - Laboratori de Modelització i Anàlisi de la Informació - Ponències/Comunicacions de congressos [64]
- Departament d'Enginyeria de Serveis i Sistemes d'Informació - Ponències/Comunicacions de congressos [497]
- Departament d'Estadística i Investigació Operativa - Ponències/Comunicacions de congressos [246]
Files | Description | Size | Format | View |
---|---|---|---|---|
CAISE_2018_Bilalli.pdf | 952,1Kb | View/Open |