Gene expression data classification combining hierarchical representation and efficient feature selection
Visualitza/Obre
Gene expression data classification combining hierarchical representation and efficient feature selection (580,3Kb) (Accés restringit)
Sol·licita una còpia a l'autor
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
Cita com:
hdl:2117/18425
Tipus de documentArticle
Data publicació2012-12
Condicions d'accésAccés restringit per política de l'editorial
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
A general framework for microarray data classification is proposed in this paper. It pro-
duces precise and reliable classifiers through a two-step approach. At first, the original
feature set is enhanced by a new set of features called metagenes. These new features
are obtained through a hierarchical clustering process on the original data. Two different
metagene generation rules have been analyzed, called Treelets clustering and Euclidean
clustering. Metagenes creation is attractive for several reasons: first, they can improve
the classification since they broaden the available feature space and capture the com-
mon behavior of similar genes reducing the residual measurement noise. Furthermore,
by analyzing some of the chosen metagenes for classification with gene set enrichment
analysis algorithms, it is shown how metagenes can summarize the behavior of func-
tionally related probe sets. Additionally, metagenes can point out, still undocumented,
highly discriminant probe sets numerically related to other probes endowed with prior
biological information in order to contribute to the knowledge discovery process.
The second step of the framework is the feature selection which applies the Improved
Sequential Floating Forward Selection algorithm (IFFS) to properly choose a subset from
the available feature set for classification composed of genes and metagenes. Considering
the microarray sample scarcity problem, besides the classical error rate, a reliability
measure is introduced to improve the feature selection process. Different scoring schemes
are studied to choose the best one using both error rate and reliability. The Linear
Discriminant Analysis classifier (LDA) has been used throughout this work, due to its
good characteristics, but the proposed framework can be used with almost any classifier.
The potential of the proposed framework has been evaluated analyzing all the publicly
available datasets offered by the Micro Array Quality Control Study, phase II (MAQC).
The comparative results showed that the proposed framework can compete with a wide
variety of state of the art alternatives and it can obtain the best mean performance
if a particular setup is chosen. A Monte Carlo simulation confirmed that the proposed
framework obtains stable and repeatable results.
CitacióBosio, M. [et al.]. Gene expression data classification combining hierarchical representation and efficient feature selection. "Journal of biological systems", Desembre 2012, vol. 20, núm. 4, p. 349-375.
ISSN0218-3390
Versió de l'editorhttp://www.worldscientific.com/doi/pdfplus/10.1142/S0218339012400025
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Gene expression ... ient feature selection.pdf | Gene expression data classification combining hierarchical representation and efficient feature selection | 580,3Kb | Accés restringit |