Protein classification from primary structures in the context of database biocuration

Terpugova, Ilmira

dc.contributor	Vellido Alcacena, Alfredo
dc.contributor	Romero Merino, Enrique
dc.contributor.author	Terpugova, Ilmira
dc.date.accessioned	2017-07-21T10:27:02Z
dc.date.available	2017-07-21T10:27:02Z
dc.date.issued	2017-04
dc.identifier.uri	http://hdl.handle.net/2117/106701
dc.description	En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)
dc.description.abstract	The problem of automatic protein classification using only their primary structures plays an important role in modern bioinformatics research, especially for proteins whose 3-D structures are yet unknown. One of these types of proteins, at the center of this thesis, is class C of the G-Protein Coupled Receptors super-family. This class is of a great interest in pharmacoproteomics, from the point of view of drug design, because of their involvement in signaling pathways in cells of the central nervous system. The automatic classification of protein sequences may improve the understanding of their function and be a basis for the prediction of their 3-D structure, which is an information of interest for drug research. This thesis compares classification results for different versions of the same database, including the most recent ones. This exploration of the evolution of classification provides relevant information about its capabilities and limitations. Furthermore, and given that several data transformations are investigated, it also provides strong evidence concerning the robustness of these transformations. The other important contribution of the thesis is the investigation oriented towards the definition of approaches for semi-automatized database curation by using the automatic evaluation of the database changes between versions with advanced machine learning techniques. The thesis shows the consistency in improvements of the quality of the data between three versions of the database across different classification techniques and different primary structure transformations. It also validates the recently introduced continuous distributed representation for protein sequences, originally developed for natural text processing. This new representation is shown to be adequate and robust for the task of primary structure classification.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.subject	Àrees temàtiques de la UPC::Informàtica
dc.subject.lcsh	Databases
dc.subject.lcsh	Distributed artificial intelligence
dc.subject.other	protein sequence classification
dc.subject.other	word2vec
dc.subject.other	prot2vec
dc.subject.other	distributed representations
dc.subject.other	G protein-coupled receptors
dc.subject.other	bio-curation
dc.title	Protein classification from primary structures in the context of database biocuration
dc.type	Master thesis
dc.subject.lemac	Bases de dades
dc.subject.lemac	Intel·ligència artificial distribuïda
dc.identifier.slug	124491
dc.rights.access	Open Access
dc.date.updated	2017-05-11T04:00:20Z
dc.audience.educationlevel	Màster
dc.audience.mediator	Facultat d'Informàtica de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2012)

Fitxers d'aquest items

Nom:: 124491.pdf
Mida:: 1,978Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Master in Artificial Intelligence - MAI [278]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Protein classification from primary structures in the context of database biocuration

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora