Mostra el registre d'ítem simple

dc.contributor.authorBosio, Mattia
dc.contributor.authorBellot Pujalte, Pau
dc.contributor.authorSalembier Clairon, Philippe Jean
dc.contributor.authorOliveras Vergés, Albert
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2013-03-19T18:10:53Z
dc.date.created2012-12
dc.date.issued2012-12
dc.identifier.citationBosio, M. [et al.]. Gene expression data classification combining hierarchical representation and efficient feature selection. "Journal of biological systems", Desembre 2012, vol. 20, núm. 4, p. 349-375.
dc.identifier.issn0218-3390
dc.identifier.urihttp://hdl.handle.net/2117/18425
dc.description.abstractA general framework for microarray data classification is proposed in this paper. It pro- duces precise and reliable classifiers through a two-step approach. At first, the original feature set is enhanced by a new set of features called metagenes. These new features are obtained through a hierarchical clustering process on the original data. Two different metagene generation rules have been analyzed, called Treelets clustering and Euclidean clustering. Metagenes creation is attractive for several reasons: first, they can improve the classification since they broaden the available feature space and capture the com- mon behavior of similar genes reducing the residual measurement noise. Furthermore, by analyzing some of the chosen metagenes for classification with gene set enrichment analysis algorithms, it is shown how metagenes can summarize the behavior of func- tionally related probe sets. Additionally, metagenes can point out, still undocumented, highly discriminant probe sets numerically related to other probes endowed with prior biological information in order to contribute to the knowledge discovery process. The second step of the framework is the feature selection which applies the Improved Sequential Floating Forward Selection algorithm (IFFS) to properly choose a subset from the available feature set for classification composed of genes and metagenes. Considering the microarray sample scarcity problem, besides the classical error rate, a reliability measure is introduced to improve the feature selection process. Different scoring schemes are studied to choose the best one using both error rate and reliability. The Linear Discriminant Analysis classifier (LDA) has been used throughout this work, due to its good characteristics, but the proposed framework can be used with almost any classifier. The potential of the proposed framework has been evaluated analyzing all the publicly available datasets offered by the Micro Array Quality Control Study, phase II (MAQC). The comparative results showed that the proposed framework can compete with a wide variety of state of the art alternatives and it can obtain the best mean performance if a particular setup is chosen. A Monte Carlo simulation confirmed that the proposed framework obtains stable and repeatable results.
dc.format.extent27 p.
dc.language.isoeng
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació
dc.subject.lcshBiology -- Data processing
dc.subject.lcshBioinformatics
dc.subject.otherfeature selection
dc.subject.otherhierarchical representation
dc.subject.otherLDA
dc.subject.othermetagenes
dc.subject.otherMicroarray classification
dc.subject.otherTreelets
dc.titleGene expression data classification combining hierarchical representation and efficient feature selection
dc.typeArticle
dc.subject.lemacBioinformàtica
dc.subject.lemacBiologia -- Informàtica
dc.contributor.groupUniversitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo
dc.identifier.doi10.1142/S0218339012400025
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://www.worldscientific.com/doi/pdfplus/10.1142/S0218339012400025
dc.rights.accessRestricted access - publisher's policy
local.identifier.drac11757392
dc.description.versionPostprint (published version)
dc.date.lift10000-01-01
local.citation.authorBosio, M.; Bellot, P.; Salembier, P.; Oliveras, A.
local.citation.publicationNameJournal of biological systems
local.citation.volume20
local.citation.number4
local.citation.startingPage349
local.citation.endingPage375


Fitxers d'aquest items

Imatge en miniatura

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple