Capítols de llibre
http://hdl.handle.net/2117/98195
2023-02-08T01:55:57ZRelationship between the popularity of key words in the Google browser and the evolution of worldwide financial indices
http://hdl.handle.net/2117/101920
Relationship between the popularity of key words in the Google browser and the evolution of worldwide financial indices
Ortells Sesé, Robert; Egozcue Rubí, Juan José; Ortego Martínez, María Isabel; Garola Crespo, Àlvar
The authoritative contributions gathered in this volume reflect the state of the art in compositional data analysis (CoDa). The respective chapters cover all aspects of CoDa, ranging from mathematical theory, statistical methods and techniques to its broad range of applications in geochemistry, the life sciences and other disciplines. The selected and peer-reviewed papers were originally presented at the 6th International Workshop on Compositional Data Analysis, CoDaWork 2015, held in L’Escala (Girona), Spain.
Compositional data is defined as vectors of positive components and constant sum, and, more generally, all those vectors representing parts of a whole which only carry relative information. Examples of compositional data can be found in many different fields such as geology, chemistry, economics, medicine, ecology and sociology. As most of the classical statistical techniques are incoherent on compositions, in the 1980s John Aitchison proposed the log-ratio approach to CoDa. This became the foundation of modern CoDa, which is now based on a specific geometric structure for the simplex, an appropriate representation of the sample space of compositional data.
The International Workshops on Compositional Data Analysis offer a vital discussion forum for researchers and practitioners concerned with the statistical treatment and modelling of compositional data or other constrained data sets and the interpretation of models and their applications. The goal of the workshops is to summarize and share recent developments, and to identify important lines of future research.
2017-03-03T15:22:14ZOrtells Sesé, RobertEgozcue Rubí, Juan JoséOrtego Martínez, María IsabelGarola Crespo, ÀlvarThe authoritative contributions gathered in this volume reflect the state of the art in compositional data analysis (CoDa). The respective chapters cover all aspects of CoDa, ranging from mathematical theory, statistical methods and techniques to its broad range of applications in geochemistry, the life sciences and other disciplines. The selected and peer-reviewed papers were originally presented at the 6th International Workshop on Compositional Data Analysis, CoDaWork 2015, held in L’Escala (Girona), Spain.
Compositional data is defined as vectors of positive components and constant sum, and, more generally, all those vectors representing parts of a whole which only carry relative information. Examples of compositional data can be found in many different fields such as geology, chemistry, economics, medicine, ecology and sociology. As most of the classical statistical techniques are incoherent on compositions, in the 1980s John Aitchison proposed the log-ratio approach to CoDa. This became the foundation of modern CoDa, which is now based on a specific geometric structure for the simplex, an appropriate representation of the sample space of compositional data.
The International Workshops on Compositional Data Analysis offer a vital discussion forum for researchers and practitioners concerned with the statistical treatment and modelling of compositional data or other constrained data sets and the interpretation of models and their applications. The goal of the workshops is to summarize and share recent developments, and to identify important lines of future research.An application of the isometric log-ratio transformation in relatedness research
http://hdl.handle.net/2117/98196
An application of the isometric log-ratio transformation in relatedness research
Graffelman, Jan; Galván Femenía, Iván
Abstract Genetic marker data contains information on the degree of relatedness of a pair of individuals. Relatedness investigations are usually based on the extent to which alleles of a pair of individuals match over a set of markers for which their genotype has been determined. A distinction is usually drawn between alleles that are identical by state (IBS) and alleles that are identical by descent (IBD). Since any pair of individuals can only share 0, 1, or 2 alleles IBS or IBD for any marker, 3-way compositions can be computed that consist of the fractions of markers sharing 0, 1, or 2 alleles IBS (or IBD) for each pair. For any given standard relationship (e.g., parent– offspring, sister–brother, etc.) the probabilities k 0 , k 1 and k 2 of sharing 0, 1 or 2 IBD alleles are easily deduced and are usually referred to as Cotterman’s coefficients. Marker data can be used to estimate these coefficients by maximum likelihood. This maximization problem has the 2-simplex as its domain. If there is no inbreeding, then the maximum must occur in a subset of the 2-simplex. The maximization problem is then subject to an additional nonlinear constraint ( k 2 1 = 4 k 0 k 2 ). Special optimization routines are needed that do respect all constraints of the problem. A reparametrization of the likelihood in terms of isometric log-ratio (ilr) coordinates greatly simplifies the maximization problem. In isometric log-ratio coordinates the domain turns out to be rectangular, and maximization can be carried out by standard general-purpose maximization routines. We illustrate this point with some examples using data from the HapMap project
2016-12-14T09:58:39ZGraffelman, JanGalván Femenía, IvánAbstract Genetic marker data contains information on the degree of relatedness of a pair of individuals. Relatedness investigations are usually based on the extent to which alleles of a pair of individuals match over a set of markers for which their genotype has been determined. A distinction is usually drawn between alleles that are identical by state (IBS) and alleles that are identical by descent (IBD). Since any pair of individuals can only share 0, 1, or 2 alleles IBS or IBD for any marker, 3-way compositions can be computed that consist of the fractions of markers sharing 0, 1, or 2 alleles IBS (or IBD) for each pair. For any given standard relationship (e.g., parent– offspring, sister–brother, etc.) the probabilities k 0 , k 1 and k 2 of sharing 0, 1 or 2 IBD alleles are easily deduced and are usually referred to as Cotterman’s coefficients. Marker data can be used to estimate these coefficients by maximum likelihood. This maximization problem has the 2-simplex as its domain. If there is no inbreeding, then the maximum must occur in a subset of the 2-simplex. The maximization problem is then subject to an additional nonlinear constraint ( k 2 1 = 4 k 0 k 2 ). Special optimization routines are needed that do respect all constraints of the problem. A reparametrization of the likelihood in terms of isometric log-ratio (ilr) coordinates greatly simplifies the maximization problem. In isometric log-ratio coordinates the domain turns out to be rectangular, and maximization can be carried out by standard general-purpose maximization routines. We illustrate this point with some examples using data from the HapMap projectA Compositional Approach to Allele Sharing Analysis
http://hdl.handle.net/2117/98194
A Compositional Approach to Allele Sharing Analysis
Galván Femenía, Iván; Graffelman, Jan
Relatedness is of great interest in population-based genetic association studies. These studies search for genetic factors related to disease. Many statistical methods used in population-based genetic association studies (such as standard regression models, t-tests, and logistic regression) assume that the observations (individuals) are independent. These techniques can fail if independence is not satisfied. Allele sharing is a powerful data analysis technique for analyzing the degree of dependence in diploid species. Two individuals can share 0, 1, or 2 alleles for any genetic marker. This sharing may be assessed for alleles identical by state (IBS) or identical by descent (IBD). Starting from IBS alleles, it is possible to detect the type of relationship of a pair of individuals by using graphical methods. Typical allele sharing analysis consists of plotting the fraction of loci sharing 2 IBS alleles versus the fraction of sharing 0 IBS alleles. Compositional data analysis can be applied to allele sharing analysis because the proportions of sharing 0, 1 or 2 IBS alleles
(denoted by $p_0$, $p_1$, and $p_2$) form a 3-part-composition. This chapter provides a graphical method to detect family relationships by plotting the isometric log-ratio transformation of $p_0, p_1$, and $p_2$. On the other hand, the probabilities of sharing 0, 1, or 2 IBD alleles (denoted by $k_0, k_1, k_2$), which are termed Cotterman’s coefficients, depend on the relatedness: monozygotic twins, full-siblings, parent-offspring, avuncular, first cousins, etc. It is possible to infer the type of family relationship of a pair of individuals by using maximum likelihood methods. As a result, the estimated vector $\bf{k}
= (k_0, k_1, k_2)$ for each pair of individuals forms a 3-part-composition and can be plotted in a ternary diagram to identify the degree of relatedness. An R package has been developed for the study of genetic relatedness based on genetic markers such as microsatellites and single nucleotide polymorphisms from human populations, and is used for the computations and graphics of this contribution.
2016-12-14T09:43:27ZGalván Femenía, IvánGraffelman, JanRelatedness is of great interest in population-based genetic association studies. These studies search for genetic factors related to disease. Many statistical methods used in population-based genetic association studies (such as standard regression models, t-tests, and logistic regression) assume that the observations (individuals) are independent. These techniques can fail if independence is not satisfied. Allele sharing is a powerful data analysis technique for analyzing the degree of dependence in diploid species. Two individuals can share 0, 1, or 2 alleles for any genetic marker. This sharing may be assessed for alleles identical by state (IBS) or identical by descent (IBD). Starting from IBS alleles, it is possible to detect the type of relationship of a pair of individuals by using graphical methods. Typical allele sharing analysis consists of plotting the fraction of loci sharing 2 IBS alleles versus the fraction of sharing 0 IBS alleles. Compositional data analysis can be applied to allele sharing analysis because the proportions of sharing 0, 1 or 2 IBS alleles
(denoted by $p_0$, $p_1$, and $p_2$) form a 3-part-composition. This chapter provides a graphical method to detect family relationships by plotting the isometric log-ratio transformation of $p_0, p_1$, and $p_2$. On the other hand, the probabilities of sharing 0, 1, or 2 IBD alleles (denoted by $k_0, k_1, k_2$), which are termed Cotterman’s coefficients, depend on the relatedness: monozygotic twins, full-siblings, parent-offspring, avuncular, first cousins, etc. It is possible to infer the type of family relationship of a pair of individuals by using maximum likelihood methods. As a result, the estimated vector $\bf{k}
= (k_0, k_1, k_2)$ for each pair of individuals forms a 3-part-composition and can be plotted in a ternary diagram to identify the degree of relatedness. An R package has been developed for the study of genetic relatedness based on genetic markers such as microsatellites and single nucleotide polymorphisms from human populations, and is used for the computations and graphics of this contribution.