Enabling interpretation of the outcome of a human obesity prediction machine learning analysis from genomic data
Fitxers
Títol de la revista
ISSN de la revista
Títol del volum
Col·laborador
Editor
Tribunal avaluador
Realitzat a/amb
Tipus de document
Data publicació
Editor
Condicions d'accés
item.page.rightslicense
Publicacions relacionades
Datasets relacionats
Projecte CCD
Projecte
Abstract
In this brief paper, we address the medical problem of human obesity prediction from genomic data. Genomic datasets may contain a huge number of features and they often have to be analyzed within the realm of Big Data technologies. As a medical problem, obesity prediction would welcome interpretables outcomes. Therefore, the analyst would benefit from appraches in which the problem of very high data dimensionality could be eased as much as possible. Feature selection can be an essential part of such approaches. In this context, though, traditional machine learning methods may struggle. Here, we propose a pipeline to address this problem using partitioning strategies: both vertical, by dividing the data based on gender, and horizontal, by splitting each of the analyzed chromosomes into 5,000-instances subsets. For each, Minimum Redundancy and Maximum Relevance feature selection is used to find rankings of the single nucleotide polymorphisms most relevant for classification in the medical dataset.

