Show simple item record

dc.contributor.authorBilalli, Besim
dc.contributor.authorAbelló Gamazo, Alberto
dc.contributor.authorAluja Banet, Tomàs
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
dc.date.accessioned2018-01-26T08:33:16Z
dc.date.available2018-01-26T08:33:16Z
dc.date.issued2017-12-20
dc.identifier.citationBilalli, B., Abello, A., Aluja, T. On the predictive power of meta-features in OpenML. "International journal of applied mathematics and computer science", 20 Desembre 2017, vol. 27, núm. 4, p. 697-712.
dc.identifier.issn1641-876X
dc.identifier.urihttp://hdl.handle.net/2117/113229
dc.description.abstractThe demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., non-experienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is data mining (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning is, however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is a need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from exploratory factor analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize meta-data about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent features with three different performance measures of four classification algorithms on hundreds of datasets available in OpenML, and we select the latent features with the highest predictive power. Finally, we use the selected latent features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different meta-data from OpenML as the biggest source of data in this domain.
dc.format.extent16 p.
dc.language.isoeng
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Aplicacions informàtiques a la física i l‘enginyeria
dc.subject.lcshOpenMP (Application program interface)
dc.subject.otherfeature extraction
dc.subject.otherfeature selection
dc.subject.othermeta-learning
dc.titleOn the predictive power of meta-features in OpenML
dc.typeArticle
dc.subject.lemacInterfícies de programació d'aplicacions (Programari)
dc.contributor.groupUniversitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering
dc.contributor.groupUniversitat Politècnica de Catalunya. LIAM - Laboratori de Modelització i Anàlisi de la Informació
dc.identifier.doi10.1515/amcs-2017-0048
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://www.degruyter.com/view/j/amcs.2017.27.issue-4/amcs-2017-0048/amcs-2017-0048.xml
dc.rights.accessOpen Access
local.identifier.drac21872765
dc.description.versionPostprint (published version)
local.citation.authorBilalli, B.; Abello, A.; Aluja, T.
local.citation.publicationNameInternational journal of applied mathematics and computer science
local.citation.volume27
local.citation.number4
local.citation.startingPage697
local.citation.endingPage712


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain