Mostra el registre d'ítem simple

dc.contributorVellido Alcacena, Alfredo
dc.contributorRomero Merino, Enrique
dc.contributor.authorTerpugova, Ilmira
dc.date.accessioned2017-07-21T10:27:02Z
dc.date.available2017-07-21T10:27:02Z
dc.date.issued2017-04
dc.identifier.urihttp://hdl.handle.net/2117/106701
dc.descriptionEn col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)
dc.description.abstractThe problem of automatic protein classification using only their primary structures plays an important role in modern bioinformatics research, especially for proteins whose 3-D structures are yet unknown. One of these types of proteins, at the center of this thesis, is class C of the G-Protein Coupled Receptors super-family. This class is of a great interest in pharmacoproteomics, from the point of view of drug design, because of their involvement in signaling pathways in cells of the central nervous system. The automatic classification of protein sequences may improve the understanding of their function and be a basis for the prediction of their 3-D structure, which is an information of interest for drug research. This thesis compares classification results for different versions of the same database, including the most recent ones. This exploration of the evolution of classification provides relevant information about its capabilities and limitations. Furthermore, and given that several data transformations are investigated, it also provides strong evidence concerning the robustness of these transformations. The other important contribution of the thesis is the investigation oriented towards the definition of approaches for semi-automatized database curation by using the automatic evaluation of the database changes between versions with advanced machine learning techniques. The thesis shows the consistency in improvements of the quality of the data between three versions of the database across different classification techniques and different primary structure transformations. It also validates the recently introduced continuous distributed representation for protein sequences, originally developed for natural text processing. This new representation is shown to be adequate and robust for the task of primary structure classification.
dc.language.isoeng
dc.publisherUniversitat Politècnica de Catalunya
dc.subjectÀrees temàtiques de la UPC::Informàtica
dc.subject.lcshDatabases
dc.subject.lcshDistributed artificial intelligence
dc.subject.otherprotein sequence classification
dc.subject.otherword2vec
dc.subject.otherprot2vec
dc.subject.otherdistributed representations
dc.subject.otherG protein-coupled receptors
dc.subject.otherbio-curation
dc.titleProtein classification from primary structures in the context of database biocuration
dc.typeMaster thesis
dc.subject.lemacBases de dades
dc.subject.lemacIntel·ligència artificial distribuïda
dc.identifier.slug124491
dc.rights.accessOpen Access
dc.date.updated2017-05-11T04:00:20Z
dc.audience.educationlevelMàster
dc.audience.mediatorFacultat d'Informàtica de Barcelona
dc.audience.degreeMÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2012)


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple