Ponències/Comunicacions de congressos
http://hdl.handle.net/2117/3689
Sun, 22 Jan 2017 12:28:03 GMT2017-01-22T12:28:03ZAutomated quality control for proton magnetic resonance spectroscopy data using convex non-negative matrix factorization
http://hdl.handle.net/2117/99395
Automated quality control for proton magnetic resonance spectroscopy data using convex non-negative matrix factorization
Mocioiu, Victor; Kyathanahally, Sreenath P.; Arús, Carles; Vellido Alcacena, Alfredo; Julià Sapé, Margarida
Proton Magnetic Resonance Spectroscopy (1H MRS) has proven its diagnostic potential in a variety of conditions. However, MRS is not yet widely used in clinical routine because of the lack of experts on its diagnostic interpretation. Although data-based decision support systems exist to aid diagnosis, they often take for granted that the
data is of good quality, which is not always the case in a real application context. Systems based on models built with bad quality data are likely to underperform in their decision support tasks. In this study, we propose a system to filter out such bad quality data. It is based on convex Non-Negative Matrix Factorization models, used as a dimensionality reduction procedure, and on the use of several classifiers to discriminate between good and bad quality data.
Tue, 17 Jan 2017 08:51:09 GMThttp://hdl.handle.net/2117/993952017-01-17T08:51:09ZMocioiu, VictorKyathanahally, Sreenath P.Arús, CarlesVellido Alcacena, AlfredoJulià Sapé, MargaridaProton Magnetic Resonance Spectroscopy (1H MRS) has proven its diagnostic potential in a variety of conditions. However, MRS is not yet widely used in clinical routine because of the lack of experts on its diagnostic interpretation. Although data-based decision support systems exist to aid diagnosis, they often take for granted that the
data is of good quality, which is not always the case in a real application context. Systems based on models built with bad quality data are likely to underperform in their decision support tasks. In this study, we propose a system to filter out such bad quality data. It is based on convex Non-Negative Matrix Factorization models, used as a dimensionality reduction procedure, and on the use of several classifiers to discriminate between good and bad quality data.A machine learning pipeline for supporting differentiation of glioblastomas from single brain metastases
http://hdl.handle.net/2117/97584
A machine learning pipeline for supporting differentiation of glioblastomas from single brain metastases
Mocioiu, Victor; de Barros, Nuno M. Pedrosa; Ortega Martorell, Sandra; Slotboom, Johannes; Knecht, Urspeter; Arús, Carles; Vellido Alcacena, Alfredo; Julià Sapé, Margarida
Machine learning has provided, over the last decades, tools for knowledge extraction in complex medical domains. Most of these tools, though, are ad hoc solutions and lack the systematic approach that would be required to become mainstream in medical practice. In this brief paper, we define a machine learning-based analysis pipeline for helping in a difficult problem in the field of neuro-oncology, namely the discrimination of brain glioblastomas from single brain metastases. This pipeline involves source extraction using k-Meansinitialized Convex Non-negative Matrix Factorization and a collection of classifiers, including Logistic Regression, Linear Discriminant Analysis, AdaBoost, and Random Forests.
Thu, 01 Dec 2016 10:29:18 GMThttp://hdl.handle.net/2117/975842016-12-01T10:29:18ZMocioiu, Victorde Barros, Nuno M. PedrosaOrtega Martorell, SandraSlotboom, JohannesKnecht, UrspeterArús, CarlesVellido Alcacena, AlfredoJulià Sapé, MargaridaMachine learning has provided, over the last decades, tools for knowledge extraction in complex medical domains. Most of these tools, though, are ad hoc solutions and lack the systematic approach that would be required to become mainstream in medical practice. In this brief paper, we define a machine learning-based analysis pipeline for helping in a difficult problem in the field of neuro-oncology, namely the discrimination of brain glioblastomas from single brain metastases. This pipeline involves source extraction using k-Meansinitialized Convex Non-negative Matrix Factorization and a collection of classifiers, including Logistic Regression, Linear Discriminant Analysis, AdaBoost, and Random Forests.Instance and feature weighted k-nearest-neighbors algorithm
http://hdl.handle.net/2117/97582
Instance and feature weighted k-nearest-neighbors algorithm
Prat, Gabriel; Belanche Muñoz, Luis Antonio
We present a novel method that aims at providing a more stable selection of feature subsets when variations in the training process occur. This is accomplished by using an instance-weighting process -assigning different importances to instances as a preprocessing step to a feature weighting method that is independent of the learner, and then making good use of both sets of computed weigths in a standard Nearest-Neighbours classifier.
We report extensive experimentation in well-known benchmarking datasets as well as some challenging microarray
gene expression problems. Our results show increases in stability for most subset sizes and most problems, without
compromising prediction accuracy.
Thu, 01 Dec 2016 10:15:32 GMThttp://hdl.handle.net/2117/975822016-12-01T10:15:32ZPrat, GabrielBelanche Muñoz, Luis AntonioWe present a novel method that aims at providing a more stable selection of feature subsets when variations in the training process occur. This is accomplished by using an instance-weighting process -assigning different importances to instances as a preprocessing step to a feature weighting method that is independent of the learner, and then making good use of both sets of computed weigths in a standard Nearest-Neighbours classifier.
We report extensive experimentation in well-known benchmarking datasets as well as some challenging microarray
gene expression problems. Our results show increases in stability for most subset sizes and most problems, without
compromising prediction accuracy.Physics and machine learning: Emerging paradigms
http://hdl.handle.net/2117/97581
Physics and machine learning: Emerging paradigms
Martín Guerrero, José; Lisboa, Paulo J G; Vellido Alcacena, Alfredo
Current research in Machine Learning (ML) combines the study of variations on well-established methods with cutting-edge breakthroughs based on completely new approaches. Among the latter, emerging paradigms from Physics have taken special relevance in recent years. Although still in its initial stages, Quantum Machine Learning (QML) shows promising ways to speed up some of the costly ML calculations with a similar or even better performance than existing approaches. Two additional advantages are related to the intrinsic probabilistic approach of QML, since quantum states are genuinely probabilistic, and to the capability of finding the global optimum of a given cost function by means of adiabatic quantum optimization, thus circumventing the usual problem of local minima. Another Physics approach for ML comes from Statistical Physics and is linked to Information theory in supervised and semi-supervised learning frameworks. On the other hand, and from the perspective of Physics, ML can provide solutions by extracting knowledge from huge amounts of data, as it is common in many experiments in the field, such as those related to High Energy Physics for elementary-particle research and Observational Astronomy.
Thu, 01 Dec 2016 10:08:53 GMThttp://hdl.handle.net/2117/975812016-12-01T10:08:53ZMartín Guerrero, JoséLisboa, Paulo J GVellido Alcacena, AlfredoCurrent research in Machine Learning (ML) combines the study of variations on well-established methods with cutting-edge breakthroughs based on completely new approaches. Among the latter, emerging paradigms from Physics have taken special relevance in recent years. Although still in its initial stages, Quantum Machine Learning (QML) shows promising ways to speed up some of the costly ML calculations with a similar or even better performance than existing approaches. Two additional advantages are related to the intrinsic probabilistic approach of QML, since quantum states are genuinely probabilistic, and to the capability of finding the global optimum of a given cost function by means of adiabatic quantum optimization, thus circumventing the usual problem of local minima. Another Physics approach for ML comes from Statistical Physics and is linked to Information theory in supervised and semi-supervised learning frameworks. On the other hand, and from the perspective of Physics, ML can provide solutions by extracting knowledge from huge amounts of data, as it is common in many experiments in the field, such as those related to High Energy Physics for elementary-particle research and Observational Astronomy.A proposal for climate change resilience management through fuzzy controllers
http://hdl.handle.net/2117/90715
A proposal for climate change resilience management through fuzzy controllers
González Cárdenas, Rubén; Nebot Castells, M. Àngela; Múgica Álvarez, Francisco
We aim towards the implementation of a set of fuzzy controllers capable to perform automated estimation of the period of time necessary to recover a resilience level through the non-linear influence of a set of interrelated climate change resilience indicators constrained by social-based variables. This fuzzy controller set, working together with a fuzzy inference system type Mamdani, will be capable to estimate the proper adjustments to be done onto system’s elements in order to achieve a certain resilience level, while a general estimation of required costs is appraised. The final tool can then be used to provide guidelines for strategic vulnerability planning and monitoring through a clear understanding between investments and results, while an open evaluation and scrutiny of applied policies is made. In this paper the main strategy to achieve the mentioned objectives is
presented and discussed.
Thu, 13 Oct 2016 07:55:07 GMThttp://hdl.handle.net/2117/907152016-10-13T07:55:07ZGonzález Cárdenas, RubénNebot Castells, M. ÀngelaMúgica Álvarez, FranciscoWe aim towards the implementation of a set of fuzzy controllers capable to perform automated estimation of the period of time necessary to recover a resilience level through the non-linear influence of a set of interrelated climate change resilience indicators constrained by social-based variables. This fuzzy controller set, working together with a fuzzy inference system type Mamdani, will be capable to estimate the proper adjustments to be done onto system’s elements in order to achieve a certain resilience level, while a general estimation of required costs is appraised. The final tool can then be used to provide guidelines for strategic vulnerability planning and monitoring through a clear understanding between investments and results, while an open evaluation and scrutiny of applied policies is made. In this paper the main strategy to achieve the mentioned objectives is
presented and discussed.Probability ridges and distortion flows: Visualizing multivariate time series using a variational Bayesian manifold learning method
http://hdl.handle.net/2117/82956
Probability ridges and distortion flows: Visualizing multivariate time series using a variational Bayesian manifold learning method
Tosi, Alessandra; Olier, Iván; Vellido Alcacena, Alfredo
Time-dependent natural phenomena and artificial processes can often be quantitatively expressed as multivariate time series (MTS). As in any other process of knowledge extraction from data, the analyst can benefit from the exploration of the characteristics of MTS through data visualization. This visualization often becomes difficult to interpret when MTS are modelled using nonlinear techniques. Despite their flexibility, nonlinear models can be rendered useless if such interpretability is lacking. In this brief paper, we model MTS using Variational Bayesian Generative Topographic Mapping Through Time (VB-GTM-TT), a variational Bayesian variant of a constrained hidden Markov model of the manifold learning family defined for MTS visualization. We aim to increase its interpretability by taking advantage of two results of the probabilistic definition of the model: the explicit estimation of probabilities of transition between states described in the visualization space and the quantification of the nonlinear mapping distortion.
Mon, 15 Feb 2016 15:20:06 GMThttp://hdl.handle.net/2117/829562016-02-15T15:20:06ZTosi, AlessandraOlier, IvánVellido Alcacena, AlfredoTime-dependent natural phenomena and artificial processes can often be quantitatively expressed as multivariate time series (MTS). As in any other process of knowledge extraction from data, the analyst can benefit from the exploration of the characteristics of MTS through data visualization. This visualization often becomes difficult to interpret when MTS are modelled using nonlinear techniques. Despite their flexibility, nonlinear models can be rendered useless if such interpretability is lacking. In this brief paper, we model MTS using Variational Bayesian Generative Topographic Mapping Through Time (VB-GTM-TT), a variational Bayesian variant of a constrained hidden Markov model of the manifold learning family defined for MTS visualization. We aim to increase its interpretability by taking advantage of two results of the probabilistic definition of the model: the explicit estimation of probabilities of transition between states described in the visualization space and the quantification of the nonlinear mapping distortion.Manifold learning visualization of metabotropic glutamate receptors
http://hdl.handle.net/2117/82949
Manifold learning visualization of metabotropic glutamate receptors
Cárdenas Domínguez, Martha Ivón; Vellido Alcacena, Alfredo; Giraldo Arjonilla, Jesús
G-Protein-Coupled Receptors (GPCRs) are cell membrane proteins with a key role in biological processes. GPCRs of class C, in particular, are of great interest in pharmacology. The lack of knowledge about their 3-D structures means they must be investigated through their primary amino acid sequences. Sequence visualization can help to explore the existing receptor sub-groupings at different partition levels. In this paper, we focus on Metabotropic Glutamate Receptors (mGluR), a subtype of class C GPCRs. Different versions of a probabilistic manifold learning model are employed to comparatively sub-group and visualize them through different transformations of their sequences.
Mon, 15 Feb 2016 14:19:35 GMThttp://hdl.handle.net/2117/829492016-02-15T14:19:35ZCárdenas Domínguez, Martha IvónVellido Alcacena, AlfredoGiraldo Arjonilla, JesúsG-Protein-Coupled Receptors (GPCRs) are cell membrane proteins with a key role in biological processes. GPCRs of class C, in particular, are of great interest in pharmacology. The lack of knowledge about their 3-D structures means they must be investigated through their primary amino acid sequences. Sequence visualization can help to explore the existing receptor sub-groupings at different partition levels. In this paper, we focus on Metabotropic Glutamate Receptors (mGluR), a subtype of class C GPCRs. Different versions of a probabilistic manifold learning model are employed to comparatively sub-group and visualize them through different transformations of their sequences.Metrics for probabilistic geometries
http://hdl.handle.net/2117/82944
Metrics for probabilistic geometries
Tosi, Alessandra; Hauberg, Søren; Vellido Alcacena, Alfredo; Lawrence, Neil D.
We investigate the geometrical structure of probabilistic generative dimensionality reduction models using the tools of Riemannian geometry. We explicitly define a distribution over the natural metric given by the models. We provide the necessary algorithms to compute expected metric tensors where the distribution over mappings is
given by a Gaussian process. We treat the corresponding latent variable model as a Riemannian manifold and we use the expectation of the metric under the Gaussian process prior to define interpolating paths and measure distance between latent points. We show how distances that respect the expected metric lead to more appropriate generation of new data.
Mon, 15 Feb 2016 14:03:37 GMThttp://hdl.handle.net/2117/829442016-02-15T14:03:37ZTosi, AlessandraHauberg, SørenVellido Alcacena, AlfredoLawrence, Neil D.We investigate the geometrical structure of probabilistic generative dimensionality reduction models using the tools of Riemannian geometry. We explicitly define a distribution over the natural metric given by the models. We provide the necessary algorithms to compute expected metric tensors where the distribution over mappings is
given by a Gaussian process. We treat the corresponding latent variable model as a Riemannian manifold and we use the expectation of the metric under the Gaussian process prior to define interpolating paths and measure distance between latent points. We show how distances that respect the expected metric lead to more appropriate generation of new data.A weighted Cramér’s V Index for the assessment of stability in the fuzzy clustering of class C G protein-coupled receptors
http://hdl.handle.net/2117/82923
A weighted Cramér’s V Index for the assessment of stability in the fuzzy clustering of class C G protein-coupled receptors
Vellido Alcacena, Alfredo; Halka, Christiana; Nebot Castells, M. Àngela
After decades of intensive use, K-Means is still a common choice for crisp data clustering in real-world applications, particularly in biomedicine and bioinformatics. It is well-known that different initializations of the algorithm can lead to different solutions, precluding replicability. It has also been reported that even solutions with very similar errors may widely differ. A criterion for the choice of clustering solutions according to a combination of error and stability measures has recently been suggested. It is based on the use of Cramér’s V index, calculated from contingency tables, which is valid only for crisp clustering. Here, this criterion is extended to fuzzy and probabilistic clustering by first defining weighted contingency tables and a corresponding weighted Cramér’s V index. The proposed method is illustrated using Fuzzy C-Means in a proteomics problem.
Mon, 15 Feb 2016 12:10:23 GMThttp://hdl.handle.net/2117/829232016-02-15T12:10:23ZVellido Alcacena, AlfredoHalka, ChristianaNebot Castells, M. ÀngelaAfter decades of intensive use, K-Means is still a common choice for crisp data clustering in real-world applications, particularly in biomedicine and bioinformatics. It is well-known that different initializations of the algorithm can lead to different solutions, precluding replicability. It has also been reported that even solutions with very similar errors may widely differ. A criterion for the choice of clustering solutions according to a combination of error and stability measures has recently been suggested. It is based on the use of Cramér’s V index, calculated from contingency tables, which is valid only for crisp clustering. Here, this criterion is extended to fuzzy and probabilistic clustering by first defining weighted contingency tables and a corresponding weighted Cramér’s V index. The proposed method is illustrated using Fuzzy C-Means in a proteomics problem.The extracellular N-terminal domain suffices to discriminate class C G Protein-Coupled Receptor subtypes from n-grams of their sequences
http://hdl.handle.net/2117/82851
The extracellular N-terminal domain suffices to discriminate class C G Protein-Coupled Receptor subtypes from n-grams of their sequences
König, Caroline; Alquézar Mancho, René; Vellido Alcacena, Alfredo; Giraldo Arjonilla, Jesús
The investigation of protein functionality often relies on the knowledge of crystal 3-D structure. This structure is not always known or easily unravelled, which is the case of eukaryotic cell membrane proteins such as G Protein-Coupled Receptors (GPCRs) and specially of those of class C, which are the target of the current study. In the absence of information about tertiary or quaternary structures, functionality can be investigated from the primary structure, that is, from the amino acid sequence. In previous research, we found that the different subtypes of class C GPCRs could be discriminated with a high level of accuracy from the n-gram transformation of their complete primary sequences, using a method that combined two-stage feature selection with kernel classifiers. This study aims at discovering whether subunits of the complete sequence retain such discrimination capabilities. We report experiments that show that the extracellular N-terminal domain of the receptor suffices to retain the classification accuracy of the complete sequence and that it does so using a reduced selection of n-grams whose length of up to five amino acids opens up an avenue for class C GPCR signature motif discovery.
Thu, 11 Feb 2016 12:53:37 GMThttp://hdl.handle.net/2117/828512016-02-11T12:53:37ZKönig, CarolineAlquézar Mancho, RenéVellido Alcacena, AlfredoGiraldo Arjonilla, JesúsThe investigation of protein functionality often relies on the knowledge of crystal 3-D structure. This structure is not always known or easily unravelled, which is the case of eukaryotic cell membrane proteins such as G Protein-Coupled Receptors (GPCRs) and specially of those of class C, which are the target of the current study. In the absence of information about tertiary or quaternary structures, functionality can be investigated from the primary structure, that is, from the amino acid sequence. In previous research, we found that the different subtypes of class C GPCRs could be discriminated with a high level of accuracy from the n-gram transformation of their complete primary sequences, using a method that combined two-stage feature selection with kernel classifiers. This study aims at discovering whether subunits of the complete sequence retain such discrimination capabilities. We report experiments that show that the extracellular N-terminal domain of the receptor suffices to retain the classification accuracy of the complete sequence and that it does so using a reduced selection of n-grams whose length of up to five amino acids opens up an avenue for class C GPCR signature motif discovery.