Ponències/Comunicacions de congressos

Ponències/Comunicacions de congressos http://hdl.handle.net/2117/3689 2024-04-25T21:06:52Z A deep learning-based method for uncovering GPCR ligand-induced conformational states using interpretability techniques http://hdl.handle.net/2117/380032 A deep learning-based method for uncovering GPCR ligand-induced conformational states using interpretability techniques Gutiérrez Mondragón, Mario Alberto; König, Caroline; Vellido Alcacena, Alfredo There is increasing interest in the development of tools for investigating the protein ligand space. Understanding the underlying mechanisms of G protein-coupled receptors (GPCR) in the ligand-binding process is of particular interest due to their role in pharmacoproteomics. In this work, we propose the study of GPCR ligand-induced conformational variations from Molecular Dynamics (MD) simulations using Deep Learning (DL)-based methods. We devise and train a Convolutional Neural Network (CNN) for classifying the states for both ligand-free structure and the bound of agonists in the ß2-adrenergic receptor. We also study the transformation of MD data into an interaction network matrix to further improve and facilitate the analyses without significant loss of information. Our method introduces a framework for the study of the effect of ligand-receptor binding activity that includes a novel analysis based on interpretability algorithms, allowing the quantification of the contribution of individual residues to structural re-arrangements. 2023-01-12T08:29:06Z Gutiérrez Mondragón, Mario Alberto König, Caroline Vellido Alcacena, Alfredo There is increasing interest in the development of tools for investigating the protein ligand space. Understanding the underlying mechanisms of G protein-coupled receptors (GPCR) in the ligand-binding process is of particular interest due to their role in pharmacoproteomics. In this work, we propose the study of GPCR ligand-induced conformational variations from Molecular Dynamics (MD) simulations using Deep Learning (DL)-based methods. We devise and train a Convolutional Neural Network (CNN) for classifying the states for both ligand-free structure and the bound of agonists in the ß2-adrenergic receptor. We also study the transformation of MD data into an interaction network matrix to further improve and facilitate the analyses without significant loss of information. Our method introduces a framework for the study of the effect of ligand-receptor binding activity that includes a novel analysis based on interpretability algorithms, allowing the quantification of the contribution of individual residues to structural re-arrangements. The importance of interpretability and visualization in ML for medical applications http://hdl.handle.net/2117/369869 The importance of interpretability and visualization in ML for medical applications Vellido Alcacena, Alfredo Many areas of science have made a sharp transition towards data-dependent methods, enabled by simultaneous advances in data acquisition and the development of networked system technologies. This is particularly clear in the life sciences, which can be seen as a perfect scenario for the use of machine learning to address problems in which more traditional data analysis approaches might struggle. But this scenario also poses some serious challenges. One of them is the lack interpretability and explainability for complex nonlinear models. In medicine and health care, not addressing such challenge might seriously limit the chances of adoption of these methods. In this summary paper, we pay specific attention to one of the ways in which interpretability and explainability can be addressed in this context: data and model visualization 2022-07-08T08:13:12Z Vellido Alcacena, Alfredo Many areas of science have made a sharp transition towards data-dependent methods, enabled by simultaneous advances in data acquisition and the development of networked system technologies. This is particularly clear in the life sciences, which can be seen as a perfect scenario for the use of machine learning to address problems in which more traditional data analysis approaches might struggle. But this scenario also poses some serious challenges. One of them is the lack interpretability and explainability for complex nonlinear models. In medicine and health care, not addressing such challenge might seriously limit the chances of adoption of these methods. In this summary paper, we pay specific attention to one of the ways in which interpretability and explainability can be addressed in this context: data and model visualization The coming of age of interpretable and explainable machine learning models http://hdl.handle.net/2117/368022 The coming of age of interpretable and explainable machine learning models Lisboa, Paulo; Saralajew, Sascha; Vellido Alcacena, Alfredo; Villmann, Thomas Machine learning-based systems are now part of a wide array of real-world applications seamlessly embedded in the social realm. In the wake of this realisation, strict legal regulations for these systems are currently being developed, addressing some of the risks they may pose. This is the coming of age of the interpretability and explainability problems in machine learning-based data analysis, which can no longer be seen just as an academic research problem. In this tutorial, associated to ESANN 2021 special session on “Interpretable Models in Machine Learning and Explainable Artificial Intelligence”, we discuss explainable and interpretable machine learning as post-hoc and ante-hoc strategies to address these problems and highlight several aspects related to them, including their assessment. The contributions accepted for the session are then presented in this context 2022-06-03T10:05:19Z Lisboa, Paulo Saralajew, Sascha Vellido Alcacena, Alfredo Villmann, Thomas Machine learning-based systems are now part of a wide array of real-world applications seamlessly embedded in the social realm. In the wake of this realisation, strict legal regulations for these systems are currently being developed, addressing some of the risks they may pose. This is the coming of age of the interpretability and explainability problems in machine learning-based data analysis, which can no longer be seen just as an academic research problem. In this tutorial, associated to ESANN 2021 special session on “Interpretable Models in Machine Learning and Explainable Artificial Intelligence”, we discuss explainable and interpretable machine learning as post-hoc and ante-hoc strategies to address these problems and highlight several aspects related to them, including their assessment. The contributions accepted for the session are then presented in this context Off-the-grid: Fast and effective hyperparameter search for kernel clustering http://hdl.handle.net/2117/354170 Off-the-grid: Fast and effective hyperparameter search for kernel clustering Ordozgoiti Rubio, Bruno; Belanche Muñoz, Luis Antonio Kernel functions are a powerful tool to enhance the k-means clustering algorithm via the kernel trick. It is known that the parameters of the chosen kernel function can have a dramatic impact on the result. In supervised settings, these can be tuned via cross-validation, but for clustering this is not straightforward and heuristics are usually employed. In this paper we study the impact of kernel parameters on kernel k-means. In particular, we derive a lower bound, tight up to constant factors, below which the parameter of the RBF kernel will render kernel k-means meaningless. We argue that grid search can be ineffective for hyperparameter search in this context and propose an alternative algorithm for this purpose. In addition, we offer an efficient implementation based on fast approximate exponentiation with provable quality guarantees. Our experimental results demonstrate the ability of our method to efficiently reveal a rich and useful set of hyperparameter values. 2021-10-21T10:12:27Z Ordozgoiti Rubio, Bruno Belanche Muñoz, Luis Antonio Kernel functions are a powerful tool to enhance the k-means clustering algorithm via the kernel trick. It is known that the parameters of the chosen kernel function can have a dramatic impact on the result. In supervised settings, these can be tuned via cross-validation, but for clustering this is not straightforward and heuristics are usually employed. In this paper we study the impact of kernel parameters on kernel k-means. In particular, we derive a lower bound, tight up to constant factors, below which the parameter of the RBF kernel will render kernel k-means meaningless. We argue that grid search can be ineffective for hyperparameter search in this context and propose an alternative algorithm for this purpose. In addition, we offer an efficient implementation based on fast approximate exponentiation with provable quality guarantees. Our experimental results demonstrate the ability of our method to efficiently reveal a rich and useful set of hyperparameter values. Fault detection and identification in a fuel cell system http://hdl.handle.net/2117/342400 Fault detection and identification in a fuel cell system Escobet Canal, Antoni; Nebot Castells, M. Àngela In this work a fault diagnosis system for non-linear plants based on fuzzy logic, called VisualBlock-FIR, is presented and applied to an energy generation system based on fuel cells. VisualBlock-FIR runs under the Simulink framework and enables early fault detection and identification. During fault detection, the fault diagnosis system should recognize that the system is not working properly. During fault identification, it should conclude which type of failure has occurred. The diagnosis results for some of the most frequent faults in fuel cell systems are presented. 2021-03-24T16:15:58Z Escobet Canal, Antoni Nebot Castells, M. Àngela In this work a fault diagnosis system for non-linear plants based on fuzzy logic, called VisualBlock-FIR, is presented and applied to an energy generation system based on fuel cells. VisualBlock-FIR runs under the Simulink framework and enables early fault detection and identification. During fault detection, the fault diagnosis system should recognize that the system is not working properly. During fault identification, it should conclude which type of failure has occurred. The diagnosis results for some of the most frequent faults in fuel cell systems are presented. Interpreting response to TMZ therapy in murine GL261 glioblastoma by combining Radiomics, Convex-NMF and feature selection in MRI/MRSI data analysis http://hdl.handle.net/2117/336149 Interpreting response to TMZ therapy in murine GL261 glioblastoma by combining Radiomics, Convex-NMF and feature selection in MRI/MRSI data analysis Nuñez Vivero, Luis Miguel; Julia Sape, Margarida; Romero Merino, Enrique; Arus Caraltó, Carles; Vellido Alcacena, Alfredo; Candiota Silveira, Ana Paula Machine learning (ML) methods have shown great potential for the analysis of data involved in medical decisions. However, for these methods to be incorpored in the medical pipeline, they must be made interpretable not only to the data analyst, but also to the medical expert. In this work, we have applied a combination of feature transformation, selection and classification using ML and statistical methods to differentiate between control (untreated) and Temozolomide (TMZ)-treated tumour tissue from a glioblastoma (brain tumour) murine model. As input, we have used T2 weighted magnetic resonance images (MRI) and spectroscopic imaging (MRSI). Radiomics features have been extracted from the MRI dataset, while convex Non-negative Matrix Factorization (Convex-NMF) was used to extract sources from the MRSI dataset. Exhaustive feature selection has revealed parsimonious feature subsets that facilitate the expert interpretation of results while retaining a high discriminatory ability. 2021-01-28T11:23:16Z Nuñez Vivero, Luis Miguel Julia Sape, Margarida Romero Merino, Enrique Arus Caraltó, Carles Vellido Alcacena, Alfredo Candiota Silveira, Ana Paula Machine learning (ML) methods have shown great potential for the analysis of data involved in medical decisions. However, for these methods to be incorpored in the medical pipeline, they must be made interpretable not only to the data analyst, but also to the medical expert. In this work, we have applied a combination of feature transformation, selection and classification using ML and statistical methods to differentiate between control (untreated) and Temozolomide (TMZ)-treated tumour tissue from a glioblastoma (brain tumour) murine model. As input, we have used T2 weighted magnetic resonance images (MRI) and spectroscopic imaging (MRSI). Radiomics features have been extracted from the MRI dataset, while convex Non-negative Matrix Factorization (Convex-NMF) was used to extract sources from the MRSI dataset. Exhaustive feature selection has revealed parsimonious feature subsets that facilitate the expert interpretation of results while retaining a high discriminatory ability. Similarity-based heterogeneous neuron models http://hdl.handle.net/2117/184293 Similarity-based heterogeneous neuron models Belanche Muñoz, Luis Antonio This paper introduces a general class of neuron models, accepting heterogeneous inputs in the form of mixtures of continuous (crisp or fuzzy) numbers, linguistic information, and discrete (either ordinal or nominal) quantities, with provision also for missing information. Their internal stimulation is based on an explicit similarity relation between the input and weight tuples (which are also heterogeneous). The framework is comprehensive and several models can be derived as instances --in particular, two of the commonly used models are shown to compute a specific similarity function provided all inputs are real-valued and complete. An example family of models defined by composition of a Gower-based similarity with a sigmoid function is shown to lead to network designs (Heterogeneous Neural Networks) capable of learning from non-trivial data sets with a remarkable effectiveness, comparable to that of classical models. 2020-04-22T10:35:07Z Belanche Muñoz, Luis Antonio This paper introduces a general class of neuron models, accepting heterogeneous inputs in the form of mixtures of continuous (crisp or fuzzy) numbers, linguistic information, and discrete (either ordinal or nominal) quantities, with provision also for missing information. Their internal stimulation is based on an explicit similarity relation between the input and weight tuples (which are also heterogeneous). The framework is comprehensive and several models can be derived as instances --in particular, two of the commonly used models are shown to compute a specific similarity function provided all inputs are real-valued and complete. An example family of models defined by composition of a Gower-based similarity with a sigmoid function is shown to lead to network designs (Heterogeneous Neural Networks) capable of learning from non-trivial data sets with a remarkable effectiveness, comparable to that of classical models. Fuzzy inputs and missing data in similarity-based heterogeneous neural networks http://hdl.handle.net/2117/184280 Fuzzy inputs and missing data in similarity-based heterogeneous neural networks Belanche Muñoz, Luis Antonio; Valdés Ramos, Julio José Fuzzy heterogeneous networks are recently introduced neural network models composed of neurons of a general class whose inputs and weights are mixtures of continuous variables (crisp and/or fuzzy) with discrete quantities, also admitting missing data. These networks have net input functions based on similarity relations between the inputs and the weights of a neuron. They thus accept heterogeneous—possibly missing—inputs, and can be coupled with classical neurons in hybrid network architectures, trained by means of genetic algorithms or other evolutionary methods. This paper compares the effectiveness of the fuzzy heterogeneous model based on similarity with the classical feed-forward one, in the context of an investigation in the field of environmental sciences, namely, the geochemical study of natural waters in the Arctic (Spitzbergen). Classification performance, the effect of working with crisp or fuzzy imputs, the use of traditional scalar product vs. similarity-based functions, and the presence of missing data, are studied. The results obtained show that, from these standpoints, fuzzy heterogeneous networks based on similarity perform better than classical feed-forward models. This behaviour is consistent with previous results in other application domains. 2020-04-22T10:12:13Z Belanche Muñoz, Luis Antonio Valdés Ramos, Julio José Fuzzy heterogeneous networks are recently introduced neural network models composed of neurons of a general class whose inputs and weights are mixtures of continuous variables (crisp and/or fuzzy) with discrete quantities, also admitting missing data. These networks have net input functions based on similarity relations between the inputs and the weights of a neuron. They thus accept heterogeneous—possibly missing—inputs, and can be coupled with classical neurons in hybrid network architectures, trained by means of genetic algorithms or other evolutionary methods. This paper compares the effectiveness of the fuzzy heterogeneous model based on similarity with the classical feed-forward one, in the context of an investigation in the field of environmental sciences, namely, the geochemical study of natural waters in the Arctic (Spitzbergen). Classification performance, the effect of working with crisp or fuzzy imputs, the use of traditional scalar product vs. similarity-based functions, and the presence of missing data, are studied. The results obtained show that, from these standpoints, fuzzy heterogeneous networks based on similarity perform better than classical feed-forward models. This behaviour is consistent with previous results in other application domains. On some strategies for missing values in positive semidefinite matrices http://hdl.handle.net/2117/184213 On some strategies for missing values in positive semidefinite matrices Belanche Muñoz, Luis Antonio; Vázquez García, Miguel This article presents our work on missing values in Positive Semi-Definite or PSD matrices. We show how simple properties of PSD matrices can be used to deal with missing values. We study several situations and investigate their applicability to support vector machine (SVM) problems. In order to illustrate the methods, a set of experiments is presented and discussed. 2020-04-22T07:14:07Z Belanche Muñoz, Luis Antonio Vázquez García, Miguel This article presents our work on missing values in Positive Semi-Definite or PSD matrices. We show how simple properties of PSD matrices can be used to deal with missing values. We study several situations and investigate their applicability to support vector machine (SVM) problems. In order to illustrate the methods, a set of experiments is presented and discussed. A thermodynamic algorithm for feature selection http://hdl.handle.net/2117/184116 A thermodynamic algorithm for feature selection Belanche Muñoz, Luis Antonio; González Navarro, Félix Fernando The main purpose of Feature Selection (FS) is to find a reduced subset of attributes from a data set described by a feature set. This implies a search process in the space of possible solutions, trying to optimize an objective function. This work introduces TAFS, a Thermodynamic Annealing Feature Selection algorithm. Given a suitable objective function, TAFS uses a special-purpose implementation of simulated annealing to find a good subset of attributes that maximizes this objective function. A distinctive characteristic of TAFS over other search algorithms for feature subset selection is its probabilistic capability to accept momentarily worse solutions. TAFS has been evaluated against one of the most robust and reliable algorithm, the Sequential Forward Floating Search method (SFFS). Our experimental results show that TAFS achieves significant improvements over SFFS in the objective function for classification tasks with a reasonable reduction in subset size. 2020-04-21T11:40:51Z Belanche Muñoz, Luis Antonio González Navarro, Félix Fernando The main purpose of Feature Selection (FS) is to find a reduced subset of attributes from a data set described by a feature set. This implies a search process in the space of possible solutions, trying to optimize an objective function. This work introduces TAFS, a Thermodynamic Annealing Feature Selection algorithm. Given a suitable objective function, TAFS uses a special-purpose implementation of simulated annealing to find a good subset of attributes that maximizes this objective function. A distinctive characteristic of TAFS over other search algorithms for feature subset selection is its probabilistic capability to accept momentarily worse solutions. TAFS has been evaluated against one of the most robust and reliable algorithm, the Sequential Forward Floating Search method (SFFS). Our experimental results show that TAFS achieves significant improvements over SFFS in the objective function for classification tasks with a reasonable reduction in subset size.