Capítols de llibre
http://hdl.handle.net/2117/3946
2017-08-23T17:44:50ZMarshall-Olkin extended Zipf distribution
http://hdl.handle.net/2117/105912
Marshall-Olkin extended Zipf distribution
Pérez Casany, Marta; Duarte López, Ariel; Prat Pérez, Arnau
Being able to generate large synthetic graphs resembling those found in the real world, is of high importance for the design of new graph algorithms and benchmarks. In this paper, we first compare several probability models in terms of goodness-of-fit, when used to model the degree distribution of real graphs. Second, after confirming that the MOEZipf model is the one that gives better fits, we present a method to generate MOEZipf distributions. The method is shown to work well in practice when implemented in a scalable synthetic graph generator.
2017-06-28T07:17:33ZPérez Casany, MartaDuarte López, ArielPrat Pérez, ArnauBeing able to generate large synthetic graphs resembling those found in the real world, is of high importance for the design of new graph algorithms and benchmarks. In this paper, we first compare several probability models in terms of goodness-of-fit, when used to model the degree distribution of real graphs. Second, after confirming that the MOEZipf model is the one that gives better fits, we present a method to generate MOEZipf distributions. The method is shown to work well in practice when implemented in a scalable synthetic graph generator.Using the Marshall-Olkin extended Zipf distribution in graph generation
http://hdl.handle.net/2117/105744
Using the Marshall-Olkin extended Zipf distribution in graph generation
Duarte López, Ariel; Prat Pérez, Arnau; Pérez Casany, Marta
Being able to generate large synthetic graphs resembling those found in the real world, is of high importance for the design of new graph algorithms and benchmarks. In this paper, we first compare several probability models in terms of goodness-of-fit, when used to model the degree distribution of real graphs. Second, after confirming that the MOEZipf model is the one that gives better fits, we present a method to generate MOEZipf distributions. The method is shown to work well in practice when implemented in a scalable synthetic graph generator.
2017-06-23T06:39:52ZDuarte López, ArielPrat Pérez, ArnauPérez Casany, MartaBeing able to generate large synthetic graphs resembling those found in the real world, is of high importance for the design of new graph algorithms and benchmarks. In this paper, we first compare several probability models in terms of goodness-of-fit, when used to model the degree distribution of real graphs. Second, after confirming that the MOEZipf model is the one that gives better fits, we present a method to generate MOEZipf distributions. The method is shown to work well in practice when implemented in a scalable synthetic graph generator.Overview on agent-based social modelling and the use of formal languages
http://hdl.handle.net/2117/105724
Overview on agent-based social modelling and the use of formal languages
Montañola Sales, Cristina; Cela Espín, José M.; Rubio Campillo, Xavier; Casanovas Garcia, Josep; Kaplan Marcusan, Adriana
Transdisciplinary Models and Applications investigates a variety of programming languages used in validating and verifying models in order to assist in their eventual implementation. This book will explore different methods of evaluating and formalizing simulation models, enabling computer and industrial engineers, mathematicians, and students working with computer simulations to thoroughly understand the progression from simulation to product, improving the overall effectiveness of modeling systems.
2017-06-22T09:10:12ZMontañola Sales, CristinaCela Espín, José M.Rubio Campillo, XavierCasanovas Garcia, JosepKaplan Marcusan, AdrianaTransdisciplinary Models and Applications investigates a variety of programming languages used in validating and verifying models in order to assist in their eventual implementation. This book will explore different methods of evaluating and formalizing simulation models, enabling computer and industrial engineers, mathematicians, and students working with computer simulations to thoroughly understand the progression from simulation to product, improving the overall effectiveness of modeling systems.Large-scale social simulation, dealing with complexity challenges in high performance environments
http://hdl.handle.net/2117/99661
Large-scale social simulation, dealing with complexity challenges in high performance environments
Montañola Sales, Cristina; Rubio Campillo, Xavier; Casanovas Garcia, Josep; Cela Espín, José M.; Kaplan Marcusan, Adriana
Advances on information technology in the past decades have provided new tools to assist scientists in the study of social and natural phenomena. Agent-based modeling techniques have flourished recently, encouraging the introduction of computer simulations to examine behavioral patterns in complex human and biological systems. Real-world social dynamics are very complex, containing billions of interacting individuals and an important amount of data (both spatial and social). Dealing with large-scale agent-based models is not an easy task and encounters several challenges. The design of strategies to overcome these challenges represents an opportunity for high performance parallel and distributed implementation. This chapter examines the most relevant aspects to deal with large-scale agent-based simulations in social sciences and revises the developments to confront technological issues.
2017-01-19T09:35:58ZMontañola Sales, CristinaRubio Campillo, XavierCasanovas Garcia, JosepCela Espín, José M.Kaplan Marcusan, AdrianaAdvances on information technology in the past decades have provided new tools to assist scientists in the study of social and natural phenomena. Agent-based modeling techniques have flourished recently, encouraging the introduction of computer simulations to examine behavioral patterns in complex human and biological systems. Real-world social dynamics are very complex, containing billions of interacting individuals and an important amount of data (both spatial and social). Dealing with large-scale agent-based models is not an easy task and encounters several challenges. The design of strategies to overcome these challenges represents an opportunity for high performance parallel and distributed implementation. This chapter examines the most relevant aspects to deal with large-scale agent-based simulations in social sciences and revises the developments to confront technological issues.Identifying nutritional patterns through integrative multiview clustering
http://hdl.handle.net/2117/99193
Identifying nutritional patterns through integrative multiview clustering
Sevilla-Villanueva, Beatriz; Gibert, Karina; Sànchez-Marrè, Miquel
The main goal of this work is to develop a methodology for finding nutritional patterns based on a variety of subject characteristics which can contribute to better understand the interactions between nutrition and health, provided that the complexity of the phenomenon gives poor performance using classical approaches. An innovative methodology based on advanced clustering techniques is proposed in order to find more compact patterns or clusters. The Integrative Multiview Clustering (IMC) combines Multiview Clustering approach with crossing operations over the several partitions obtained. Comparison with other classical clustering techniques is provided to assess the performance of our approach. The Dunn-like cluster validity index proposed by Bezdek & Pal is used for the comparison from a structural point of view, as it is more robust than the original Dunn index. The performance of the IMC method is better than other popular clustering techniques based on the Dunn-like Index. Our findings suggest that the Integrative Multiview Clustering provides more compact and separated clusters. In addition, IMC helps to reduce the high dimensionality of the data based on multiview division of attributes and also, the resulting partition is easier to interpret. Using the Integrative Multiview Clustering approach, a good partition is obtained from a structural point of view. Also, the interpretation of the resulting partition is clearer than the one obtained by classical approache
2017-01-13T09:28:14ZSevilla-Villanueva, BeatrizGibert, KarinaSànchez-Marrè, MiquelThe main goal of this work is to develop a methodology for finding nutritional patterns based on a variety of subject characteristics which can contribute to better understand the interactions between nutrition and health, provided that the complexity of the phenomenon gives poor performance using classical approaches. An innovative methodology based on advanced clustering techniques is proposed in order to find more compact patterns or clusters. The Integrative Multiview Clustering (IMC) combines Multiview Clustering approach with crossing operations over the several partitions obtained. Comparison with other classical clustering techniques is provided to assess the performance of our approach. The Dunn-like cluster validity index proposed by Bezdek & Pal is used for the comparison from a structural point of view, as it is more robust than the original Dunn index. The performance of the IMC method is better than other popular clustering techniques based on the Dunn-like Index. Our findings suggest that the Integrative Multiview Clustering provides more compact and separated clusters. In addition, IMC helps to reduce the high dimensionality of the data based on multiview division of attributes and also, the resulting partition is easier to interpret. Using the Integrative Multiview Clustering approach, a good partition is obtained from a structural point of view. Also, the interpretation of the resulting partition is clearer than the one obtained by classical approacheAn application of the isometric log-ratio transformation in relatedness research
http://hdl.handle.net/2117/98196
An application of the isometric log-ratio transformation in relatedness research
Graffelman, Jan; Galván Femenía, Iván
Abstract Genetic marker data contains information on the degree of relatedness of a pair of individuals. Relatedness investigations are usually based on the extent to which alleles of a pair of individuals match over a set of markers for which their genotype has been determined. A distinction is usually drawn between alleles that are identical by state (IBS) and alleles that are identical by descent (IBD). Since any pair of individuals can only share 0, 1, or 2 alleles IBS or IBD for any marker, 3-way compositions can be computed that consist of the fractions of markers sharing 0, 1, or 2 alleles IBS (or IBD) for each pair. For any given standard relationship (e.g., parent– offspring, sister–brother, etc.) the probabilities k 0 , k 1 and k 2 of sharing 0, 1 or 2 IBD alleles are easily deduced and are usually referred to as Cotterman’s coefficients. Marker data can be used to estimate these coefficients by maximum likelihood. This maximization problem has the 2-simplex as its domain. If there is no inbreeding, then the maximum must occur in a subset of the 2-simplex. The maximization problem is then subject to an additional nonlinear constraint ( k 2 1 = 4 k 0 k 2 ). Special optimization routines are needed that do respect all constraints of the problem. A reparametrization of the likelihood in terms of isometric log-ratio (ilr) coordinates greatly simplifies the maximization problem. In isometric log-ratio coordinates the domain turns out to be rectangular, and maximization can be carried out by standard general-purpose maximization routines. We illustrate this point with some examples using data from the HapMap project
2016-12-14T09:58:39ZGraffelman, JanGalván Femenía, IvánAbstract Genetic marker data contains information on the degree of relatedness of a pair of individuals. Relatedness investigations are usually based on the extent to which alleles of a pair of individuals match over a set of markers for which their genotype has been determined. A distinction is usually drawn between alleles that are identical by state (IBS) and alleles that are identical by descent (IBD). Since any pair of individuals can only share 0, 1, or 2 alleles IBS or IBD for any marker, 3-way compositions can be computed that consist of the fractions of markers sharing 0, 1, or 2 alleles IBS (or IBD) for each pair. For any given standard relationship (e.g., parent– offspring, sister–brother, etc.) the probabilities k 0 , k 1 and k 2 of sharing 0, 1 or 2 IBD alleles are easily deduced and are usually referred to as Cotterman’s coefficients. Marker data can be used to estimate these coefficients by maximum likelihood. This maximization problem has the 2-simplex as its domain. If there is no inbreeding, then the maximum must occur in a subset of the 2-simplex. The maximization problem is then subject to an additional nonlinear constraint ( k 2 1 = 4 k 0 k 2 ). Special optimization routines are needed that do respect all constraints of the problem. A reparametrization of the likelihood in terms of isometric log-ratio (ilr) coordinates greatly simplifies the maximization problem. In isometric log-ratio coordinates the domain turns out to be rectangular, and maximization can be carried out by standard general-purpose maximization routines. We illustrate this point with some examples using data from the HapMap projectA Compositional Approach to Allele Sharing Analysis
http://hdl.handle.net/2117/98194
A Compositional Approach to Allele Sharing Analysis
Galván Femenía, Iván; Graffelman, Jan
Relatedness is of great interest in population-based genetic association studies. These studies search for genetic factors related to disease. Many statistical methods used in population-based genetic association studies (such as standard regression models, t-tests, and logistic regression) assume that the observations (individuals) are independent. These techniques can fail if independence is not satisfied. Allele sharing is a powerful data analysis technique for analyzing the degree of dependence in diploid species. Two individuals can share 0, 1, or 2 alleles for any genetic marker. This sharing may be assessed for alleles identical by state (IBS) or identical by descent (IBD). Starting from IBS alleles, it is possible to detect the type of relationship of a pair of individuals by using graphical methods. Typical allele sharing analysis consists of plotting the fraction of loci sharing 2 IBS alleles versus the fraction of sharing 0 IBS alleles. Compositional data analysis can be applied to allele sharing analysis because the proportions of sharing 0, 1 or 2 IBS alleles
(denoted by $p_0$, $p_1$, and $p_2$) form a 3-part-composition. This chapter provides a graphical method to detect family relationships by plotting the isometric log-ratio transformation of $p_0, p_1$, and $p_2$. On the other hand, the probabilities of sharing 0, 1, or 2 IBD alleles (denoted by $k_0, k_1, k_2$), which are termed Cotterman’s coefficients, depend on the relatedness: monozygotic twins, full-siblings, parent-offspring, avuncular, first cousins, etc. It is possible to infer the type of family relationship of a pair of individuals by using maximum likelihood methods. As a result, the estimated vector $\bf{k}
= (k_0, k_1, k_2)$ for each pair of individuals forms a 3-part-composition and can be plotted in a ternary diagram to identify the degree of relatedness. An R package has been developed for the study of genetic relatedness based on genetic markers such as microsatellites and single nucleotide polymorphisms from human populations, and is used for the computations and graphics of this contribution.
2016-12-14T09:43:27ZGalván Femenía, IvánGraffelman, JanRelatedness is of great interest in population-based genetic association studies. These studies search for genetic factors related to disease. Many statistical methods used in population-based genetic association studies (such as standard regression models, t-tests, and logistic regression) assume that the observations (individuals) are independent. These techniques can fail if independence is not satisfied. Allele sharing is a powerful data analysis technique for analyzing the degree of dependence in diploid species. Two individuals can share 0, 1, or 2 alleles for any genetic marker. This sharing may be assessed for alleles identical by state (IBS) or identical by descent (IBD). Starting from IBS alleles, it is possible to detect the type of relationship of a pair of individuals by using graphical methods. Typical allele sharing analysis consists of plotting the fraction of loci sharing 2 IBS alleles versus the fraction of sharing 0 IBS alleles. Compositional data analysis can be applied to allele sharing analysis because the proportions of sharing 0, 1 or 2 IBS alleles
(denoted by $p_0$, $p_1$, and $p_2$) form a 3-part-composition. This chapter provides a graphical method to detect family relationships by plotting the isometric log-ratio transformation of $p_0, p_1$, and $p_2$. On the other hand, the probabilities of sharing 0, 1, or 2 IBD alleles (denoted by $k_0, k_1, k_2$), which are termed Cotterman’s coefficients, depend on the relatedness: monozygotic twins, full-siblings, parent-offspring, avuncular, first cousins, etc. It is possible to infer the type of family relationship of a pair of individuals by using maximum likelihood methods. As a result, the estimated vector $\bf{k}
= (k_0, k_1, k_2)$ for each pair of individuals forms a 3-part-composition and can be plotted in a ternary diagram to identify the degree of relatedness. An R package has been developed for the study of genetic relatedness based on genetic markers such as microsatellites and single nucleotide polymorphisms from human populations, and is used for the computations and graphics of this contribution.A methodology for maintaining consistency between conceptual interpretations of nested partitions
http://hdl.handle.net/2117/95947
A methodology for maintaining consistency between conceptual interpretations of nested partitions
Sevilla-Villanueva, Beatriz; Gibert, Karina; Sànchez-Marrè, Miquel
The relationship between interpretations of nested partitions is analyzed in this work, since there are multiple situations where a refinement of the original partition arises. As a result, a new methodology NCI-IMS is proposed in order to maintain the consistency between interpretations of nested partitions. This methodology extends a previous methodology that obtains classes’ descriptors by determining the significance’s robustness of the characteristics significance. Then, NCI-IMS takes advantage of the descriptors robustness obtaining a deeper analysis of the relations between superclass’s and subclasses’ descriptors.
2016-11-09T12:25:13ZSevilla-Villanueva, BeatrizGibert, KarinaSànchez-Marrè, MiquelThe relationship between interpretations of nested partitions is analyzed in this work, since there are multiple situations where a refinement of the original partition arises. As a result, a new methodology NCI-IMS is proposed in order to maintain the consistency between interpretations of nested partitions. This methodology extends a previous methodology that obtains classes’ descriptors by determining the significance’s robustness of the characteristics significance. Then, NCI-IMS takes advantage of the descriptors robustness obtaining a deeper analysis of the relations between superclass’s and subclasses’ descriptors.Fixed-charge facility location problems in location science
http://hdl.handle.net/2117/82657
Fixed-charge facility location problems in location science
Fernández Aréizaga, Elena; Landete, Mercedes
Fixed-Charge Facility Location Problems are among core problems in Location Science. There is a finite set of users with demand of service and a finite set of potential locations for the facilities that will offer service to users. Two types of decisions must be made: Location decisions determine where to establish the facilities whereas allocation decisions dictate how to satisfy the users demand from the established facilities. Potential applications of various types arise in many different contexts. We provide an overview of the main elements that may intervene in the modeling and the solution process of Fixed-Charge Facility Location Problems, namely, modeling hypotheses and their implications, characteristics of formulations and their relation to other formulations, properties of the domains, and appropriate solution techniques.
2016-02-08T10:02:41ZFernández Aréizaga, ElenaLandete, MercedesFixed-Charge Facility Location Problems are among core problems in Location Science. There is a finite set of users with demand of service and a finite set of potential locations for the facilities that will offer service to users. Two types of decisions must be made: Location decisions determine where to establish the facilities whereas allocation decisions dictate how to satisfy the users demand from the established facilities. Potential applications of various types arise in many different contexts. We provide an overview of the main elements that may intervene in the modeling and the solution process of Fixed-Charge Facility Location Problems, namely, modeling hypotheses and their implications, characteristics of formulations and their relation to other formulations, properties of the domains, and appropriate solution techniques.SDL - The IoT Language
http://hdl.handle.net/2117/81529
SDL - The IoT Language
Sherratt, Edel; Ober, Ileana; Gaudin, Emmanuel; Fonseca Casas, Pau; Kristoffersen, Finn
Interconnected smart devices constitute a large and rapidly growing element of the contemporary Internet. A smart thing can be as simple as a web-enabled device that collects and transmits sensor data to a repository for analysis, or as complex as a web-enabled system to monitor and manage a smart home. Smart things present marvellous opportunities, but when they participate in complex systems, they challenge our ability to manage risk and ensure reliability. SDL, the ITU Standard Specification and Description Language, provides many advantages for modelling and simulating communicating agents – such as smart things – before they are deployed. The potential for SDL to enhance reliability and safety is explored with respect to existing smart things below. But SDL must advance if it is to become the language of choice for developing the next generation of smart things. In particular, it must target emerging IoT platforms, it must support simulation of interactions between pre-existing smart things and new smart things, and it must facilitate deployment of large numbers of similar things. Moreover, awareness of the potential benefits of SDL must be raised if those benefits are to be realized in the current and future Internet of Things.
2016-01-15T13:17:32ZSherratt, EdelOber, IleanaGaudin, EmmanuelFonseca Casas, PauKristoffersen, FinnInterconnected smart devices constitute a large and rapidly growing element of the contemporary Internet. A smart thing can be as simple as a web-enabled device that collects and transmits sensor data to a repository for analysis, or as complex as a web-enabled system to monitor and manage a smart home. Smart things present marvellous opportunities, but when they participate in complex systems, they challenge our ability to manage risk and ensure reliability. SDL, the ITU Standard Specification and Description Language, provides many advantages for modelling and simulating communicating agents – such as smart things – before they are deployed. The potential for SDL to enhance reliability and safety is explored with respect to existing smart things below. But SDL must advance if it is to become the language of choice for developing the next generation of smart things. In particular, it must target emerging IoT platforms, it must support simulation of interactions between pre-existing smart things and new smart things, and it must facilitate deployment of large numbers of similar things. Moreover, awareness of the potential benefits of SDL must be raised if those benefits are to be realized in the current and future Internet of Things.