Enabling interpretation of the outcome of a human obesity prediction machine learning analysis from genomic data
| dc.contributor.author | Bilal, Ahsan |
| dc.contributor.author | Vellido Alcacena, Alfredo |
| dc.contributor.author | Ribas Ripoll, Vicent |
| dc.contributor.group | Universitat Politècnica de Catalunya. SOCO - Soft Computing |
| dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Ciències de la Computació |
| dc.date.accessioned | 2020-03-10T11:09:01Z |
| dc.date.available | 2020-03-10T11:09:01Z |
| dc.date.issued | 2018 |
| dc.description.abstract | In this brief paper, we address the medical problem of human obesity prediction from genomic data. Genomic datasets may contain a huge number of features and they often have to be analyzed within the realm of Big Data technologies. As a medical problem, obesity prediction would welcome interpretables outcomes. Therefore, the analyst would benefit from appraches in which the problem of very high data dimensionality could be eased as much as possible. Feature selection can be an essential part of such approaches. In this context, though, traditional machine learning methods may struggle. Here, we propose a pipeline to address this problem using partitioning strategies: both vertical, by dividing the data based on gender, and horizontal, by splitting each of the analyzed chromosomes into 5,000-instances subsets. For each, Minimum Redundancy and Maximum Relevance feature selection is used to find rankings of the single nucleotide polymorphisms most relevant for classification in the medical dataset. |
| dc.description.version | Preprint |
| dc.format.extent | 6 p. |
| dc.identifier.citation | Bilal, H.; Vellido, A.; Ribas, V. "Enabling interpretation of the outcome of a human obesity prediction machine learning analysis from genomic data". 2018. |
| dc.identifier.uri | https://hdl.handle.net/2117/179493 |
| dc.language.iso | eng |
| dc.relation.projectid | info:eu-repo/grantAgreement/MINECO/1PE/TIN2016-79576-R |
| dc.rights.access | Open Access |
| dc.subject | Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica |
| dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic |
| dc.subject.lcsh | Big data |
| dc.subject.lcsh | Machine learning |
| dc.subject.lcsh | Genomics |
| dc.subject.lcsh | Obesity |
| dc.subject.lemac | Macrodades |
| dc.subject.lemac | Aprenentatge automàtic |
| dc.subject.lemac | Genòmica |
| dc.subject.lemac | Obesitat |
| dc.subject.other | Feature selection |
| dc.subject.other | Minimum redundancy and maximum relevance |
| dc.subject.other | SNP |
| dc.subject.other | Apache Spark |
| dc.title | Enabling interpretation of the outcome of a human obesity prediction machine learning analysis from genomic data |
| dc.type | External research report |
| dspace.entity.type | Publication |
| local.citation.author | Bilal, H.; Vellido, A.; Ribas, V. |
| local.identifier.drac | 27266972 |
Fitxers
Paquet original
1 - 1 de 1

