Protein classification from primary structures in the context of database biocuration

Carregant...
Miniatura
El pots comprar en digital a:
El pots comprar en paper a:

Projectes de recerca

Unitats organitzatives

Número de la revista

Títol de la revista

ISSN de la revista

Títol del volum

Cita com:

Correu electrònic de l'autor

Tribunal avaluador

Realitzat a/amb

Tipus de document

Projecte Final de Màster Oficial

Condicions d'accés

Accés obert

Llicència

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització de la persona titular dels drets

Assignatures relacionades

Assignatures relacionades

Publicacions relacionades

Datasets relacionats

Datasets relacionats

Projecte CCD

Abstract

The problem of automatic protein classification using only their primary structures plays an important role in modern bioinformatics research, especially for proteins whose 3-D structures are yet unknown. One of these types of proteins, at the center of this thesis, is class C of the G-Protein Coupled Receptors super-family. This class is of a great interest in pharmacoproteomics, from the point of view of drug design, because of their involvement in signaling pathways in cells of the central nervous system. The automatic classification of protein sequences may improve the understanding of their function and be a basis for the prediction of their 3-D structure, which is an information of interest for drug research. This thesis compares classification results for different versions of the same database, including the most recent ones. This exploration of the evolution of classification provides relevant information about its capabilities and limitations. Furthermore, and given that several data transformations are investigated, it also provides strong evidence concerning the robustness of these transformations. The other important contribution of the thesis is the investigation oriented towards the definition of approaches for semi-automatized database curation by using the automatic evaluation of the database changes between versions with advanced machine learning techniques. The thesis shows the consistency in improvements of the quality of the data between three versions of the database across different classification techniques and different primary structure transformations. It also validates the recently introduced continuous distributed representation for protein sequences, originally developed for natural text processing. This new representation is shown to be adequate and robust for the task of primary structure classification.

Descripció

En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)

Provinença

Titulació

MÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2012)

Document relacionat

Citació

Ajut

DOI

Versió de l'editor

Altres identificadors

Referències