SOCO - Soft Computing
http://hdl.handle.net/2117/3686
2016-10-28T02:49:22ZFuzzy inputs and missing data in similarity-based heterogeneous neural networks
http://hdl.handle.net/2117/91156
Fuzzy inputs and missing data in similarity-based heterogeneous neural networks
Belanche Muñoz, Luis Antonio; Valdés Ramos, Julio José
Fuzzy heterogeneous networks are recently introduced feed-forward
neural network models composed of neurons of a general class whose
inputs and weights are mixtures of continuous variables (crisp and/or
fuzzy) with discrete quantities, also admitting missing data. These
networks have net input functions based on similarity relations
between the inputs to and the weights of a neuron. They thus accept
heterogeneous --possibly missing-- inputs, and can be coupled with
classical neurons in hybrid network architectures, trained by means of
genetic algorithms or other evolutionary methods.
This report compares the effectiveness of the fuzzy heterogeneous
model based on similarity with that of the classical feed-forward one,
in the context of an investigation in the field of environmental
sciences, namely, the geochemical study of natural waters in the
Arctic (Spitzbergen). Classification accuracy, the effect of working
with crisp or fuzzy inputs, the use of traditional scalar product {em
vs.} similarity based functions, and the presence of missing data, are
studied.
The results obtained show that, from these standpoints, fuzzy
heterogeneous networks based on similarity perform better than classical
feed-forward models. This behaviour is consistent with previous
results in other application domains.
2016-10-27T10:37:51ZBelanche Muñoz, Luis AntonioValdés Ramos, Julio JoséFuzzy heterogeneous networks are recently introduced feed-forward
neural network models composed of neurons of a general class whose
inputs and weights are mixtures of continuous variables (crisp and/or
fuzzy) with discrete quantities, also admitting missing data. These
networks have net input functions based on similarity relations
between the inputs to and the weights of a neuron. They thus accept
heterogeneous --possibly missing-- inputs, and can be coupled with
classical neurons in hybrid network architectures, trained by means of
genetic algorithms or other evolutionary methods.
This report compares the effectiveness of the fuzzy heterogeneous
model based on similarity with that of the classical feed-forward one,
in the context of an investigation in the field of environmental
sciences, namely, the geochemical study of natural waters in the
Arctic (Spitzbergen). Classification accuracy, the effect of working
with crisp or fuzzy inputs, the use of traditional scalar product {em
vs.} similarity based functions, and the presence of missing data, are
studied.
The results obtained show that, from these standpoints, fuzzy
heterogeneous networks based on similarity perform better than classical
feed-forward models. This behaviour is consistent with previous
results in other application domains.A weighted Cramer's V index for the assessment of stability in the fuzzy clustering of class C G protein-coupled receptors
http://hdl.handle.net/2117/90901
A weighted Cramer's V index for the assessment of stability in the fuzzy clustering of class C G protein-coupled receptors
Vellido Alcacena, Alfredo; Halka, Christiana; Nebot Castells, M. Àngela
After decades of intensive use, K-Means is still a common choice for crisp data clustering in real-world applications, particularly in biomedicine and bioinformatics. It is well-known that different initializations of the algorithm can lead to different solutions, precluding replicability. It has also been reported that even solutions with very similar errors may widely differ. A criterion for the choice of clustering solutions according to a combination of error and stability measures has recently been suggested. It is based on the use of Cramér’s V index, calculated from contingency tables, which is valid only for crisp clustering. Here, this criterion is extended to fuzzy and probabilistic clustering by first defining weighted contingency tables and a corresponding weighted Cramér’s V index. The proposed method is illustrated using Fuzzy C-Means in a proteomics problem.
2016-10-20T06:48:30ZVellido Alcacena, AlfredoHalka, ChristianaNebot Castells, M. ÀngelaAfter decades of intensive use, K-Means is still a common choice for crisp data clustering in real-world applications, particularly in biomedicine and bioinformatics. It is well-known that different initializations of the algorithm can lead to different solutions, precluding replicability. It has also been reported that even solutions with very similar errors may widely differ. A criterion for the choice of clustering solutions according to a combination of error and stability measures has recently been suggested. It is based on the use of Cramér’s V index, calculated from contingency tables, which is valid only for crisp clustering. Here, this criterion is extended to fuzzy and probabilistic clustering by first defining weighted contingency tables and a corresponding weighted Cramér’s V index. The proposed method is illustrated using Fuzzy C-Means in a proteomics problem.A proposal for climate change resilience management through fuzzy controllers
http://hdl.handle.net/2117/90715
A proposal for climate change resilience management through fuzzy controllers
González Cárdenas, Rubén; Nebot Castells, M. Àngela; Múgica Álvarez, Francisco
We aim towards the implementation of a set of fuzzy controllers capable to perform automated estimation of the period of time necessary to recover a resilience level through the non-linear influence of a set of interrelated climate change resilience indicators constrained by social-based variables. This fuzzy controller set, working together with a fuzzy inference system type Mamdani, will be capable to estimate the proper adjustments to be done onto system’s elements in order to achieve a certain resilience level, while a general estimation of required costs is appraised. The final tool can then be used to provide guidelines for strategic vulnerability planning and monitoring through a clear understanding between investments and results, while an open evaluation and scrutiny of applied policies is made. In this paper the main strategy to achieve the mentioned objectives is
presented and discussed.
2016-10-13T07:55:07ZGonzález Cárdenas, RubénNebot Castells, M. ÀngelaMúgica Álvarez, FranciscoWe aim towards the implementation of a set of fuzzy controllers capable to perform automated estimation of the period of time necessary to recover a resilience level through the non-linear influence of a set of interrelated climate change resilience indicators constrained by social-based variables. This fuzzy controller set, working together with a fuzzy inference system type Mamdani, will be capable to estimate the proper adjustments to be done onto system’s elements in order to achieve a certain resilience level, while a general estimation of required costs is appraised. The final tool can then be used to provide guidelines for strategic vulnerability planning and monitoring through a clear understanding between investments and results, while an open evaluation and scrutiny of applied policies is made. In this paper the main strategy to achieve the mentioned objectives is
presented and discussed.Gene discovery for facioscapulohumeral muscular dystrophy by machine learning techniques
http://hdl.handle.net/2117/90154
Gene discovery for facioscapulohumeral muscular dystrophy by machine learning techniques
González Navarro, Félix Fernando; Belanche Muñoz, Luis Antonio; Gámez Moreno, María G.; Flores Ríos, Brenda L.; Ibarra Esquer, Jorge E.; López Morteo, Gabriel A.
Facioscapulohumeral muscular dystrophy (FSHD) is a neuromuscular disorder that shows a preference for the facial, shoulder and upper arm muscles. FSHD affects about one in 20-400,000 people, and no effective therapeutic strategies are known to halt disease progression or reverse muscle weakness or atrophy. Many genes may be incorrectly regulated in affected muscle tissue, but the mechanisms responsible for the progressive muscle weakness remain largely unknown. Although machine learning (ML) has made significant inroads in biomedical disciplines such as cancer research, no reports have yet addressed FSHD analysis using ML techniques. This study explores a specific FSHD data set from a ML perspective. We report results showing a very promising small group of genes that clearly separates FSHD samples from healthy samples. In addition to numerical prediction figures, we show data visualizations and biological evidence illustrating the potential usefulness of these results.
2016-09-23T07:26:33ZGonzález Navarro, Félix FernandoBelanche Muñoz, Luis AntonioGámez Moreno, María G.Flores Ríos, Brenda L.Ibarra Esquer, Jorge E.López Morteo, Gabriel A.Facioscapulohumeral muscular dystrophy (FSHD) is a neuromuscular disorder that shows a preference for the facial, shoulder and upper arm muscles. FSHD affects about one in 20-400,000 people, and no effective therapeutic strategies are known to halt disease progression or reverse muscle weakness or atrophy. Many genes may be incorrectly regulated in affected muscle tissue, but the mechanisms responsible for the progressive muscle weakness remain largely unknown. Although machine learning (ML) has made significant inroads in biomedical disciplines such as cancer research, no reports have yet addressed FSHD analysis using ML techniques. This study explores a specific FSHD data set from a ML perspective. We report results showing a very promising small group of genes that clearly separates FSHD samples from healthy samples. In addition to numerical prediction figures, we show data visualizations and biological evidence illustrating the potential usefulness of these results.Comparing error minimized extreme learning machines and support vector sequential feed-forward neural networks
http://hdl.handle.net/2117/88076
Comparing error minimized extreme learning machines and support vector sequential feed-forward neural networks
Romero Merino, Enrique; Alquézar Mancho, René
Recently, error minimized extreme learning machines (EM-ELMs) have been proposed as a simple and efficient approach to build single-hidden-layer feed-forward networks (SLFNs) sequentially. They add random hidden nodes one by one (or group by group) and update the output weights incrementally to minimize the sum-of-squares error in the training set. Other very similar methods that also construct SLFNs sequentially had been reported earlier with the main difference that their hidden-layer weights are a subset of the data instead of being random. By analogy with the concept of support vectors original of support vector machines (SVMs), these approaches can be referred to as support vector sequential feed-forward neural networks (SV-SFNNs), and they are a particular case of the Sequential Approximation with Optimal Coefficients and Interacting Frequencies (SAOCIF) method. In this paper, it is firstly shown that EM-ELMs can also be cast as a particular case of SAOCIF. In particular, EM-ELMs can easily be extended to test some number of random candidates at each step and select the best of them, as SAOCIF does. Moreover, it is demonstrated that the cost of the calculation of the optimal output-layer weights in the originally proposed EM-ELMs can be improved if it is replaced by the one included in SAOCIF. Secondly, we present the results of an experimental study on 10 benchmark classification and 10 benchmark regression data sets, comparing EM-ELMs and SV-SFNNs, that was carried out under the same conditions for the two models. Although both models have the same (efficient) computational cost, a statistically significant improvement in generalization performance of SV-SFNNs vs. EM-ELMs was found in 12 out of the 20 benchmark problems.
2016-06-16T08:38:58ZRomero Merino, EnriqueAlquézar Mancho, RenéRecently, error minimized extreme learning machines (EM-ELMs) have been proposed as a simple and efficient approach to build single-hidden-layer feed-forward networks (SLFNs) sequentially. They add random hidden nodes one by one (or group by group) and update the output weights incrementally to minimize the sum-of-squares error in the training set. Other very similar methods that also construct SLFNs sequentially had been reported earlier with the main difference that their hidden-layer weights are a subset of the data instead of being random. By analogy with the concept of support vectors original of support vector machines (SVMs), these approaches can be referred to as support vector sequential feed-forward neural networks (SV-SFNNs), and they are a particular case of the Sequential Approximation with Optimal Coefficients and Interacting Frequencies (SAOCIF) method. In this paper, it is firstly shown that EM-ELMs can also be cast as a particular case of SAOCIF. In particular, EM-ELMs can easily be extended to test some number of random candidates at each step and select the best of them, as SAOCIF does. Moreover, it is demonstrated that the cost of the calculation of the optimal output-layer weights in the originally proposed EM-ELMs can be improved if it is replaced by the one included in SAOCIF. Secondly, we present the results of an experimental study on 10 benchmark classification and 10 benchmark regression data sets, comparing EM-ELMs and SV-SFNNs, that was carried out under the same conditions for the two models. Although both models have the same (efficient) computational cost, a statistically significant improvement in generalization performance of SV-SFNNs vs. EM-ELMs was found in 12 out of the 20 benchmark problems.Understanding (dis)similarity measures
http://hdl.handle.net/2117/87543
Understanding (dis)similarity measures
Belanche Muñoz, Luis Antonio
Intuitively, the concept of similarity is the notion to measure an inexact matching between two entities of the same reference set. The notions of similarity and its close relative dissimilarity are widely used in many fields of Artificial Intelligence. Yet they have many different and often partial definitions or properties, usually restricted to one field of application and thus incompatible with other uses. This paper contributes to the design and understanding of similarity and dissimilarity measures for Artificial Intelligence. A formal dual definition for each concept is proposed, joined with a set of fundamental properties. The behavior of the properties under several transformations is studied and revealed as an important matter to bear in mind. We also develop several practical examples that work out the proposed approach.
2016-05-31T12:35:44ZBelanche Muñoz, Luis AntonioIntuitively, the concept of similarity is the notion to measure an inexact matching between two entities of the same reference set. The notions of similarity and its close relative dissimilarity are widely used in many fields of Artificial Intelligence. Yet they have many different and often partial definitions or properties, usually restricted to one field of application and thus incompatible with other uses. This paper contributes to the design and understanding of similarity and dissimilarity measures for Artificial Intelligence. A formal dual definition for each concept is proposed, joined with a set of fundamental properties. The behavior of the properties under several transformations is studied and revealed as an important matter to bear in mind. We also develop several practical examples that work out the proposed approach.Visual-FIR for ozone modeling and prediction
http://hdl.handle.net/2117/87483
Visual-FIR for ozone modeling and prediction
Nebot Castells, M. Àngela; Múgica, Violeta; Escobet Canal, Antoni
Air pollution is one of the most important environmental problems in urban areas, being extremely critical in Mexico City. The main air pollution problem that has been identified in Mexico City metropolitan area is the formation of photochemical smog, primarily ozone. The study and development of modeling methodologies that allow the capturing of time series behavior becomes an important task. The present work aims to develop Fuzzy Inductive Reasoning (FIR) models using the Visual-FIR platform. FIR offers a model-based approach to modeling and predicting either univariate or multivariate time series. Visual-FIR offers an easy-friendly environment to perform this task. In this research, long term prediction of maximum ozone concentration in the centre region of Mexico City metropolitan area is performed. The data were registered every hour and include missing values. Two modeling perspectives are analyzed, i.e. monthly and seasonal models. The results show that the models identified capture the dynamic behavior of ozone contaminant in an accurate manner.
2016-05-30T11:28:07ZNebot Castells, M. ÀngelaMúgica, VioletaEscobet Canal, AntoniAir pollution is one of the most important environmental problems in urban areas, being extremely critical in Mexico City. The main air pollution problem that has been identified in Mexico City metropolitan area is the formation of photochemical smog, primarily ozone. The study and development of modeling methodologies that allow the capturing of time series behavior becomes an important task. The present work aims to develop Fuzzy Inductive Reasoning (FIR) models using the Visual-FIR platform. FIR offers a model-based approach to modeling and predicting either univariate or multivariate time series. Visual-FIR offers an easy-friendly environment to perform this task. In this research, long term prediction of maximum ozone concentration in the centre region of Mexico City metropolitan area is performed. The data were registered every hour and include missing values. Two modeling perspectives are analyzed, i.e. monthly and seasonal models. The results show that the models identified capture the dynamic behavior of ozone contaminant in an accurate manner.Modelado de las concentraciones locales de ozono en la Zona Centro del Area Metropolitana de la Ciudad de México
http://hdl.handle.net/2117/87412
Modelado de las concentraciones locales de ozono en la Zona Centro del Area Metropolitana de la Ciudad de México
Acosta, Jesús; Nebot Castells, M. Àngela; Fuertes Armengol, José Mª
La contaminación del aire constituye el problema medioambiental de
principal atención en las áreas urbanas debido a que afecta la salud de la población, en especial a la de los niños. Es por ello, que la construcción de modelos de ozono que capturen tan preciso como sea posible el comportamiento de este gas en la atmósfera resulta ser el principal interés no sólo del área científica sino de las agencias gubernamentales. En esta investigación se identifican modelos de
concentraciones de ozono para la Región Oriental Austriaca por medio de una metodología de Soft Computing denominada Razonamiento Inductivo Difuso (FIR), la cual es una herramienta muy útil para modelar y simular aquellos sistemas de los cuales no hay conocimiento previo disponible o éste es muy escaso. Es sabido que las variaciones en las funciones de pertenencia tienen un efecto en la eficiencia de los
sistemas basados en reglas difusas. La metodología FIR no es una excepción. La eficiencia de los procesos de identificación del modelado cualitativo y de predicción de FIR está muy influenciada por los parámetros de discretización de las variables del sistema, es decir, del número de clases de cada variable y de las funciones de pertenencia que definen su semántica. Es por ello que en este trabajo se presenta una metodología híbrida, unos nuevos Sistemas Genéticos Difusos (SGDs) en el contexto de la metodología FIR que sugieren de manera automática parámetros de discretización adecuados. En este trabajo se describen en detalle los componentes principales de los métodos que utilizamos.; Air pollution is the main environmental problem of care in urban areas because it affects people's health, particularly that of children. For that reason, the construction of models of ozone catching as precise as possible the behaviour of this gas in the atmosphere appears to be the main area of interest is not only scientific but government agencies. In this research identifies models of ozone concentrations for the Eastern Region of Austria through a Soft Computing methodology called Fuzzy Inductive Reasoning (FIR), which is a very useful tool for modeling and simulating those systems which no knowledge prior available or it is very low. It is known that variations in the membership functions have an effect on the efficiency of systems based on fuzzy rules. The methodology FIR is no exception. The efficiency of the qualitative modeling and prediction process of FIR is strongly influenced by the discretization parameters of the system variables, ie the number of classes of each variable and membership functions that define its semantics. In this work presents a hybrid methodology, new Genetic Fuzzy Systems (GFSs) in the context of the FIR methodology that suggests by automatic way the discretization parameters suitable. This paper describes in detail the main components of these methods.
2016-05-27T08:07:13ZAcosta, JesúsNebot Castells, M. ÀngelaFuertes Armengol, José MªLa contaminación del aire constituye el problema medioambiental de
principal atención en las áreas urbanas debido a que afecta la salud de la población, en especial a la de los niños. Es por ello, que la construcción de modelos de ozono que capturen tan preciso como sea posible el comportamiento de este gas en la atmósfera resulta ser el principal interés no sólo del área científica sino de las agencias gubernamentales. En esta investigación se identifican modelos de
concentraciones de ozono para la Región Oriental Austriaca por medio de una metodología de Soft Computing denominada Razonamiento Inductivo Difuso (FIR), la cual es una herramienta muy útil para modelar y simular aquellos sistemas de los cuales no hay conocimiento previo disponible o éste es muy escaso. Es sabido que las variaciones en las funciones de pertenencia tienen un efecto en la eficiencia de los
sistemas basados en reglas difusas. La metodología FIR no es una excepción. La eficiencia de los procesos de identificación del modelado cualitativo y de predicción de FIR está muy influenciada por los parámetros de discretización de las variables del sistema, es decir, del número de clases de cada variable y de las funciones de pertenencia que definen su semántica. Es por ello que en este trabajo se presenta una metodología híbrida, unos nuevos Sistemas Genéticos Difusos (SGDs) en el contexto de la metodología FIR que sugieren de manera automática parámetros de discretización adecuados. En este trabajo se describen en detalle los componentes principales de los métodos que utilizamos.
Air pollution is the main environmental problem of care in urban areas because it affects people's health, particularly that of children. For that reason, the construction of models of ozone catching as precise as possible the behaviour of this gas in the atmosphere appears to be the main area of interest is not only scientific but government agencies. In this research identifies models of ozone concentrations for the Eastern Region of Austria through a Soft Computing methodology called Fuzzy Inductive Reasoning (FIR), which is a very useful tool for modeling and simulating those systems which no knowledge prior available or it is very low. It is known that variations in the membership functions have an effect on the efficiency of systems based on fuzzy rules. The methodology FIR is no exception. The efficiency of the qualitative modeling and prediction process of FIR is strongly influenced by the discretization parameters of the system variables, ie the number of classes of each variable and membership functions that define its semantics. In this work presents a hybrid methodology, new Genetic Fuzzy Systems (GFSs) in the context of the FIR methodology that suggests by automatic way the discretization parameters suitable. This paper describes in detail the main components of these methods.A variational formulation for GTM through time: Theoretical foundations
http://hdl.handle.net/2117/86323
A variational formulation for GTM through time: Theoretical foundations
Olier Caparroso, Iván; Vellido Alcacena, Alfredo
Generative Topographic Mapping (GTM) is a latent variable model that, in its standard version, was conceived to provide clustering and visualization of multivariate, real-valued, i.i.d. data. It was also extended to deal with non-i.i.d. data such as multivariate time series in a variant called GTM Through Time (GTMTT), defined as a constrained Hidden Markov Model (HMM). In this technical report, we provide the theoretical foundations of the reformulation of GTM-TT within the Variational Bayesian framework. This approach, in its application, should naturally handle the presence of noise in the time series, helping to avert the problem of data overfitting.
2016-04-28T09:20:38ZOlier Caparroso, IvánVellido Alcacena, AlfredoGenerative Topographic Mapping (GTM) is a latent variable model that, in its standard version, was conceived to provide clustering and visualization of multivariate, real-valued, i.i.d. data. It was also extended to deal with non-i.i.d. data such as multivariate time series in a variant called GTM Through Time (GTMTT), defined as a constrained Hidden Markov Model (HMM). In this technical report, we provide the theoretical foundations of the reformulation of GTM-TT within the Variational Bayesian framework. This approach, in its application, should naturally handle the presence of noise in the time series, helping to avert the problem of data overfitting.A variational Bayesian formulation for GTM: Theoretical foundations
http://hdl.handle.net/2117/86314
A variational Bayesian formulation for GTM: Theoretical foundations
Olier Caparroso, Iván; Vellido Alcacena, Alfredo
Generative Topographic Mapping (GTM) is a non-linear latent variable model of the manifold learning family that provides simultaneous visualization and clustering of high-dimensional data. It was originally formulated as a constrained mixture of Gaussian distributions, for which the adaptive parameters were determined by Maximum Likelihood (ML), using the Expectation-Maximization (EM) algorithm. In this paper, we define an alternative variational formulation of GTM that provides a full Bayesian treatment to a Gaussian Process (GP) - based variation of the model.
2016-04-28T08:12:40ZOlier Caparroso, IvánVellido Alcacena, AlfredoGenerative Topographic Mapping (GTM) is a non-linear latent variable model of the manifold learning family that provides simultaneous visualization and clustering of high-dimensional data. It was originally formulated as a constrained mixture of Gaussian distributions, for which the adaptive parameters were determined by Maximum Likelihood (ML), using the Expectation-Maximization (EM) algorithm. In this paper, we define an alternative variational formulation of GTM that provides a full Bayesian treatment to a Gaussian Process (GP) - based variation of the model.