Intelligent assistance for data pre-processing

View/Open
Cita com:
hdl:2117/113239
Document typeArticle
Defense date2017-06-03
PublisherElsevier
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. Typically, a dataset needs to be pre-processed before being mined. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives. As a consequence, non-experienced users become overwhelmed with pre-processing alternatives. In this paper, we show that the problem can be addressed by automating the pre-processing with the support of meta-learning. To this end, we analyzed a wide range of data pre-processing techniques and a set of classification algorithms. For each classification algorithm that we consider and a given dataset, we are able to automatically suggest the transformations that improve the quality of the results of the algorithm on the dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.
CitationBilalli, B., Abello, A., Aluja, T., Wrembel, R. Intelligent assistance for data pre-processing. "Computer standards & interfaces", 3 Juny 2017, vol. 57, p. 101-109.
ISSN0920-5489
Collections
- inSSIDE - integrated Software, Service, Information and Data Engineering - Articles de revista [113]
- LIAM - Laboratori de Modelització i Anàlisi de la Informació - Articles de revista [50]
- Departament d'Enginyeria de Serveis i Sistemes d'Informació - Articles de revista [239]
- GESSI - Grup d'Enginyeria del Software i dels Serveis - Articles de revista [56]
- Departament d'Estadística i Investigació Operativa - Articles de revista [771]
Files | Description | Size | Format | View |
---|---|---|---|---|
1.B.Bilalli_CSI2017-1.pdf | 267,1Kb | View/Open |