Ponències/Comunicacions de congressos
http://hdl.handle.net/2117/3689
2016-09-27T10:28:40ZProbability ridges and distortion flows: Visualizing multivariate time series using a variational Bayesian manifold learning method
http://hdl.handle.net/2117/82956
Probability ridges and distortion flows: Visualizing multivariate time series using a variational Bayesian manifold learning method
Tosi, Alessandra; Olier, Iván; Vellido Alcacena, Alfredo
Time-dependent natural phenomena and artificial processes can often be quantitatively expressed as multivariate time series (MTS). As in any other process of knowledge extraction from data, the analyst can benefit from the exploration of the characteristics of MTS through data visualization. This visualization often becomes difficult to interpret when MTS are modelled using nonlinear techniques. Despite their flexibility, nonlinear models can be rendered useless if such interpretability is lacking. In this brief paper, we model MTS using Variational Bayesian Generative Topographic Mapping Through Time (VB-GTM-TT), a variational Bayesian variant of a constrained hidden Markov model of the manifold learning family defined for MTS visualization. We aim to increase its interpretability by taking advantage of two results of the probabilistic definition of the model: the explicit estimation of probabilities of transition between states described in the visualization space and the quantification of the nonlinear mapping distortion.
2016-02-15T15:20:06ZTosi, AlessandraOlier, IvánVellido Alcacena, AlfredoTime-dependent natural phenomena and artificial processes can often be quantitatively expressed as multivariate time series (MTS). As in any other process of knowledge extraction from data, the analyst can benefit from the exploration of the characteristics of MTS through data visualization. This visualization often becomes difficult to interpret when MTS are modelled using nonlinear techniques. Despite their flexibility, nonlinear models can be rendered useless if such interpretability is lacking. In this brief paper, we model MTS using Variational Bayesian Generative Topographic Mapping Through Time (VB-GTM-TT), a variational Bayesian variant of a constrained hidden Markov model of the manifold learning family defined for MTS visualization. We aim to increase its interpretability by taking advantage of two results of the probabilistic definition of the model: the explicit estimation of probabilities of transition between states described in the visualization space and the quantification of the nonlinear mapping distortion.Manifold learning visualization of metabotropic glutamate receptors
http://hdl.handle.net/2117/82949
Manifold learning visualization of metabotropic glutamate receptors
Cárdenas Domínguez, Martha Ivón; Vellido Alcacena, Alfredo; Giraldo Arjonilla, Jesús
G-Protein-Coupled Receptors (GPCRs) are cell membrane proteins with a key role in biological processes. GPCRs of class C, in particular, are of great interest in pharmacology. The lack of knowledge about their 3-D structures means they must be investigated through their primary amino acid sequences. Sequence visualization can help to explore the existing receptor sub-groupings at different partition levels. In this paper, we focus on Metabotropic Glutamate Receptors (mGluR), a subtype of class C GPCRs. Different versions of a probabilistic manifold learning model are employed to comparatively sub-group and visualize them through different transformations of their sequences.
2016-02-15T14:19:35ZCárdenas Domínguez, Martha IvónVellido Alcacena, AlfredoGiraldo Arjonilla, JesúsG-Protein-Coupled Receptors (GPCRs) are cell membrane proteins with a key role in biological processes. GPCRs of class C, in particular, are of great interest in pharmacology. The lack of knowledge about their 3-D structures means they must be investigated through their primary amino acid sequences. Sequence visualization can help to explore the existing receptor sub-groupings at different partition levels. In this paper, we focus on Metabotropic Glutamate Receptors (mGluR), a subtype of class C GPCRs. Different versions of a probabilistic manifold learning model are employed to comparatively sub-group and visualize them through different transformations of their sequences.Metrics for probabilistic geometries
http://hdl.handle.net/2117/82944
Metrics for probabilistic geometries
Tosi, Alessandra; Hauberg, Søren; Vellido Alcacena, Alfredo; Lawrence, Neil D.
We investigate the geometrical structure of probabilistic generative dimensionality reduction models using the tools of Riemannian geometry. We explicitly define a distribution over the natural metric given by the models. We provide the necessary algorithms to compute expected metric tensors where the distribution over mappings is
given by a Gaussian process. We treat the corresponding latent variable model as a Riemannian manifold and we use the expectation of the metric under the Gaussian process prior to define interpolating paths and measure distance between latent points. We show how distances that respect the expected metric lead to more appropriate generation of new data.
2016-02-15T14:03:37ZTosi, AlessandraHauberg, SørenVellido Alcacena, AlfredoLawrence, Neil D.We investigate the geometrical structure of probabilistic generative dimensionality reduction models using the tools of Riemannian geometry. We explicitly define a distribution over the natural metric given by the models. We provide the necessary algorithms to compute expected metric tensors where the distribution over mappings is
given by a Gaussian process. We treat the corresponding latent variable model as a Riemannian manifold and we use the expectation of the metric under the Gaussian process prior to define interpolating paths and measure distance between latent points. We show how distances that respect the expected metric lead to more appropriate generation of new data.A weighted Cramér’s V Index for the assessment of stability in the fuzzy clustering of class C G protein-coupled receptors
http://hdl.handle.net/2117/82923
A weighted Cramér’s V Index for the assessment of stability in the fuzzy clustering of class C G protein-coupled receptors
Vellido Alcacena, Alfredo; Halka, Christiana; Nebot Castells, M. Àngela
After decades of intensive use, K-Means is still a common choice for crisp data clustering in real-world applications, particularly in biomedicine and bioinformatics. It is well-known that different initializations of the algorithm can lead to different solutions, precluding replicability. It has also been reported that even solutions with very similar errors may widely differ. A criterion for the choice of clustering solutions according to a combination of error and stability measures has recently been suggested. It is based on the use of Cramér’s V index, calculated from contingency tables, which is valid only for crisp clustering. Here, this criterion is extended to fuzzy and probabilistic clustering by first defining weighted contingency tables and a corresponding weighted Cramér’s V index. The proposed method is illustrated using Fuzzy C-Means in a proteomics problem.
2016-02-15T12:10:23ZVellido Alcacena, AlfredoHalka, ChristianaNebot Castells, M. ÀngelaAfter decades of intensive use, K-Means is still a common choice for crisp data clustering in real-world applications, particularly in biomedicine and bioinformatics. It is well-known that different initializations of the algorithm can lead to different solutions, precluding replicability. It has also been reported that even solutions with very similar errors may widely differ. A criterion for the choice of clustering solutions according to a combination of error and stability measures has recently been suggested. It is based on the use of Cramér’s V index, calculated from contingency tables, which is valid only for crisp clustering. Here, this criterion is extended to fuzzy and probabilistic clustering by first defining weighted contingency tables and a corresponding weighted Cramér’s V index. The proposed method is illustrated using Fuzzy C-Means in a proteomics problem.The extracellular N-terminal domain suffices to discriminate class C G Protein-Coupled Receptor subtypes from n-grams of their sequences
http://hdl.handle.net/2117/82851
The extracellular N-terminal domain suffices to discriminate class C G Protein-Coupled Receptor subtypes from n-grams of their sequences
König, Caroline; Alquézar Mancho, René; Vellido Alcacena, Alfredo; Giraldo Arjonilla, Jesús
The investigation of protein functionality often relies on the knowledge of crystal 3-D structure. This structure is not always known or easily unravelled, which is the case of eukaryotic cell membrane proteins such as G Protein-Coupled Receptors (GPCRs) and specially of those of class C, which are the target of the current study. In the absence of information about tertiary or quaternary structures, functionality can be investigated from the primary structure, that is, from the amino acid sequence. In previous research, we found that the different subtypes of class C GPCRs could be discriminated with a high level of accuracy from the n-gram transformation of their complete primary sequences, using a method that combined two-stage feature selection with kernel classifiers. This study aims at discovering whether subunits of the complete sequence retain such discrimination capabilities. We report experiments that show that the extracellular N-terminal domain of the receptor suffices to retain the classification accuracy of the complete sequence and that it does so using a reduced selection of n-grams whose length of up to five amino acids opens up an avenue for class C GPCR signature motif discovery.
2016-02-11T12:53:37ZKönig, CarolineAlquézar Mancho, RenéVellido Alcacena, AlfredoGiraldo Arjonilla, JesúsThe investigation of protein functionality often relies on the knowledge of crystal 3-D structure. This structure is not always known or easily unravelled, which is the case of eukaryotic cell membrane proteins such as G Protein-Coupled Receptors (GPCRs) and specially of those of class C, which are the target of the current study. In the absence of information about tertiary or quaternary structures, functionality can be investigated from the primary structure, that is, from the amino acid sequence. In previous research, we found that the different subtypes of class C GPCRs could be discriminated with a high level of accuracy from the n-gram transformation of their complete primary sequences, using a method that combined two-stage feature selection with kernel classifiers. This study aims at discovering whether subunits of the complete sequence retain such discrimination capabilities. We report experiments that show that the extracellular N-terminal domain of the receptor suffices to retain the classification accuracy of the complete sequence and that it does so using a reduced selection of n-grams whose length of up to five amino acids opens up an avenue for class C GPCR signature motif discovery.A hierarchical perspective to fuzzy inductive reasoning: an attempt to obtain more understandable fuzzy inductive reasoning rules
http://hdl.handle.net/2117/82363
A hierarchical perspective to fuzzy inductive reasoning: an attempt to obtain more understandable fuzzy inductive reasoning rules
Bagherpour, Solmaz; Múgica Álvarez, Francisco; Nebot Castells, M. Àngela
Generalizing hypotheses based on the past data in order to predict the future is the essential core of human learning. Various successful methods and techniques have been developed so far that perform some sort of classification of current data in order to predict future unseen cases. Multi class classification problems are among them as well. In many domains in spite of these automatic techniques, involvement of human experts is crucial. In this paper we are proposing a Hierarchical perspective to Fuzzy Inductive Reasoning (FIR) method as a classifier, in order to provide more insights for experts to the predictive model offered by FIR. Also, This method puts a hierarchical constrain on FIR's generalization which might be useful in finding and predicting exceptional cases of data that don't follow the general rule offered by the model.
2016-02-01T15:25:10ZBagherpour, SolmazMúgica Álvarez, FranciscoNebot Castells, M. ÀngelaGeneralizing hypotheses based on the past data in order to predict the future is the essential core of human learning. Various successful methods and techniques have been developed so far that perform some sort of classification of current data in order to predict future unseen cases. Multi class classification problems are among them as well. In many domains in spite of these automatic techniques, involvement of human experts is crucial. In this paper we are proposing a Hierarchical perspective to Fuzzy Inductive Reasoning (FIR) method as a classifier, in order to provide more insights for experts to the predictive model offered by FIR. Also, This method puts a hierarchical constrain on FIR's generalization which might be useful in finding and predicting exceptional cases of data that don't follow the general rule offered by the model.A fuzzy inductive approach for rule-based modelling of high level structures in algorithmic composition systems
http://hdl.handle.net/2117/82353
A fuzzy inductive approach for rule-based modelling of high level structures in algorithmic composition systems
Múgica Álvarez, Francisco; Paz Ortiz, Iván; Nebot Castells, M. Àngela; Romero Merino, Enrique
Algorithmic composition systems are now widely understood. However, its capacity for producing outputs consistently showing high level structures is still a field of research. In the present work, the Fuzzy Inductive Reasoning (FIR) methodology and an extension of it, the Linguistic rules in FIR (LR-FIR) are the main tools chosen for modeling such features. FIR/LR-FIR operates over the produced outputs of an algorithmic composition system, and through qualitative user evaluation is able to extract rules using configurations of low level characteristics that models high level features. Subsequently, the rules are used for the exploration of all possible outputs of an algorithmic system finding a subset of outputs showing the desired property. Finally extracted rules are evaluated and discussed in the context of musical knowledge.
2016-02-01T14:55:22ZMúgica Álvarez, FranciscoPaz Ortiz, IvánNebot Castells, M. ÀngelaRomero Merino, EnriqueAlgorithmic composition systems are now widely understood. However, its capacity for producing outputs consistently showing high level structures is still a field of research. In the present work, the Fuzzy Inductive Reasoning (FIR) methodology and an extension of it, the Linguistic rules in FIR (LR-FIR) are the main tools chosen for modeling such features. FIR/LR-FIR operates over the produced outputs of an algorithmic composition system, and through qualitative user evaluation is able to extract rules using configurations of low level characteristics that models high level features. Subsequently, the rules are used for the exploration of all possible outputs of an algorithmic system finding a subset of outputs showing the desired property. Finally extracted rules are evaluated and discussed in the context of musical knowledge.A flexible fuzzy inductive reasoning approach for load modelling able to cope with missing data
http://hdl.handle.net/2117/82054
A flexible fuzzy inductive reasoning approach for load modelling able to cope with missing data
Jurado Gómez, Sergio; Nebot Castells, M. Àngela; Múgica Álvarez, Francisco
Load forecasting in buildings and homes has been in recent years a task of increasing importance. New services and functionalities can be offered in the home environment due to this predictions, for instance, the detection of potential demand response programs and peaks that may increase the energy bill in a dynamic tariff framework. Almost real-time predictions are key for these services but missing values can dramatically affect the performance of the energy forecasting or distort the prediction significantly. Fuzzy Inductive Reasoning has been proven to model load consumptions with high accuracy compared to other typical AI and statistical techniques. Nevertheless, it has several limitations when missing data is presented in the training data of the model and during prediction. In this paper, we present an improved version of Fuzzy Inductive Reasoning, called Flexible FIR Prediction that can cope with missing information in the input pattern as well as, in situations where patterns are not found in the behaviour matrix. The new technique has been tested with real data from one building of the Universitat Politècnica de Catalunya (UPC) and the results show that Flexible FIR Prediction is able to generate good predictions with low errors (less than 15%) although missing data is present in the training and online prediction phases.
2016-01-26T12:48:06ZJurado Gómez, SergioNebot Castells, M. ÀngelaMúgica Álvarez, FranciscoLoad forecasting in buildings and homes has been in recent years a task of increasing importance. New services and functionalities can be offered in the home environment due to this predictions, for instance, the detection of potential demand response programs and peaks that may increase the energy bill in a dynamic tariff framework. Almost real-time predictions are key for these services but missing values can dramatically affect the performance of the energy forecasting or distort the prediction significantly. Fuzzy Inductive Reasoning has been proven to model load consumptions with high accuracy compared to other typical AI and statistical techniques. Nevertheless, it has several limitations when missing data is presented in the training data of the model and during prediction. In this paper, we present an improved version of Fuzzy Inductive Reasoning, called Flexible FIR Prediction that can cope with missing information in the input pattern as well as, in situations where patterns are not found in the behaviour matrix. The new technique has been tested with real data from one building of the Universitat Politècnica de Catalunya (UPC) and the results show that Flexible FIR Prediction is able to generate good predictions with low errors (less than 15%) although missing data is present in the training and online prediction phases.Exploratory visualization of misclassified GPCRs from their transformed unaligned sequences using manifold learning techniques
http://hdl.handle.net/2117/78467
Exploratory visualization of misclassified GPCRs from their transformed unaligned sequences using manifold learning techniques
Cárdenas Domínguez, Martha Ivón; Vellido Alcacena, Alfredo; König, Caroline; Alquézar Mancho, René; Giraldo Arjonilla, Jesús
Class C G-protein-coupled receptors (GPCRs) are cell membrane proteins of great relevance to biology and pharmacology. Previous
research has revealed an upper boundary on the accuracy that can be
achieved in their classification into subtypes from the unaligned transformation of their sequences. To investigate this, we focus on sequences that have been misclassified using supervised methods. These are visualized, using a nonlinear dimensionality reduction technique and phylogenetic trees, and then characterized against the rest of the data and, particularly, against the rest of cases of their own subtype. This should help to discriminate between different types of misclassification and to build hypotheses about database quality problems and the extent to which GPCR sequence transformations limit subtype discriminability. The reported experiments provide a proof of concept for the proposed method.
2015-10-29T09:32:24ZCárdenas Domínguez, Martha IvónVellido Alcacena, AlfredoKönig, CarolineAlquézar Mancho, RenéGiraldo Arjonilla, JesúsClass C G-protein-coupled receptors (GPCRs) are cell membrane proteins of great relevance to biology and pharmacology. Previous
research has revealed an upper boundary on the accuracy that can be
achieved in their classification into subtypes from the unaligned transformation of their sequences. To investigate this, we focus on sequences that have been misclassified using supervised methods. These are visualized, using a nonlinear dimensionality reduction technique and phylogenetic trees, and then characterized against the rest of the data and, particularly, against the rest of cases of their own subtype. This should help to discriminate between different types of misclassification and to build hypotheses about database quality problems and the extent to which GPCR sequence transformations limit subtype discriminability. The reported experiments provide a proof of concept for the proposed method.Misclassification of class C G-protein-coupled receptors as a label noise problem
http://hdl.handle.net/2117/78401
Misclassification of class C G-protein-coupled receptors as a label noise problem
König, Caroline; Vellido Alcacena, Alfredo; Alquézar Mancho, René; Giraldo Arjonilla, Jesús
G-Protein-Coupled Receptors (GPCRs) are cell membrane proteins of relevance to biology and pharmacology. Their supervised classification in subtypes is hampered by label noise, which stems from a combination of expert knowledge limitations and lack of clear correspondence between labels and different representations of the protein primary sequences. In this brief study, we describe a systematic approach to the analysis of GPCR misclassifications using Support Vector Machines and use it to assist the discovery of database labeling quality problems and investigate the extent to which GPCR sequence physicochemical transformations reflect GPCR subtype labeling. The proposed approach could enable a filtering approach to the label noise problem.
2015-10-28T10:46:51ZKönig, CarolineVellido Alcacena, AlfredoAlquézar Mancho, RenéGiraldo Arjonilla, JesúsG-Protein-Coupled Receptors (GPCRs) are cell membrane proteins of relevance to biology and pharmacology. Their supervised classification in subtypes is hampered by label noise, which stems from a combination of expert knowledge limitations and lack of clear correspondence between labels and different representations of the protein primary sequences. In this brief study, we describe a systematic approach to the analysis of GPCR misclassifications using Support Vector Machines and use it to assist the discovery of database labeling quality problems and investigate the extent to which GPCR sequence physicochemical transformations reflect GPCR subtype labeling. The proposed approach could enable a filtering approach to the label noise problem.