Articles de revista
http://hdl.handle.net/2117/3942
20160214T21:21:05Z

Exact inference for HardyWeinberg proportions with missing genotypes: single and multiple imputation
http://hdl.handle.net/2117/82666
Exact inference for HardyWeinberg proportions with missing genotypes: single and multiple imputation
Graffelman, Jan; Nelson, S.; Gogarten, S.M.; Weir, B.S.
This paper addresses the issue of exacttest based statistical inference for HardyWeinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for HardyWeinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, ¿2 statistics) and exact pvalues. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for HardyWeinberg disequilibrium. Depending on the imputation method used, 613% of the test results changed qualitatively at the 5% leve
20160208T11:50:14Z
Graffelman, Jan
Nelson, S.
Gogarten, S.M.
Weir, B.S.
This paper addresses the issue of exacttest based statistical inference for HardyWeinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for HardyWeinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, ¿2 statistics) and exact pvalues. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for HardyWeinberg disequilibrium. Depending on the imputation method used, 613% of the test results changed qualitatively at the 5% leve

Evaluation of project based learning in the area of manufacturing and statistics in the degreee of industrial technology
http://hdl.handle.net/2117/82540
Evaluation of project based learning in the area of manufacturing and statistics in the degreee of industrial technology
Buj Corral, Irene; Marco Almagro, Lluís; Riba, Alexandre; Vivancos Calvet, Joan; TortMartorell Llabrés, Xavier
In the subject Project I in the second year of the Degree in Industrial Technology Engineering taught at the School of Industrial Engineering of Barcelona (ETSEIB), groups of 34 students develop a project along a semester. Results of 2 projects are presented related to manufacturing, measurement of parts and the statistical treatment of data, placing emphasis on crosscurricular issues, recording of oral presentations and how this helped improving its quality, as well as evaluation of the subject by the students by means of questionnaires and openended questions.
20160204T12:07:10Z
Buj Corral, Irene
Marco Almagro, Lluís
Riba, Alexandre
Vivancos Calvet, Joan
TortMartorell Llabrés, Xavier
In the subject Project I in the second year of the Degree in Industrial Technology Engineering taught at the School of Industrial Engineering of Barcelona (ETSEIB), groups of 34 students develop a project along a semester. Results of 2 projects are presented related to manufacturing, measurement of parts and the statistical treatment of data, placing emphasis on crosscurricular issues, recording of oral presentations and how this helped improving its quality, as well as evaluation of the subject by the students by means of questionnaires and openended questions.

Correspondence analysis on generalised aggregated lexical tables (CAGALT) in the FactoMineR package
http://hdl.handle.net/2117/82168
Correspondence analysis on generalised aggregated lexical tables (CAGALT) in the FactoMineR package
Kostov, Belchin Adriyanov; Bécue Bertaut, Mónica María; Husson, François
Correspondence analysis on generalised aggregated lexical tables (CAGALT) is a method that generalizes classical CAALT to the case of several quantitative, categorical and mixed variables. It aims to establish a typology of the external variables and a typology of the events from their mutual relationships. In order to do so, the influence of external variables on the lexical choices is untangled cancelling the associations among them, and to avoid the instability issued from multicollinearity, they are substituted by their principal components. The CaGalt function, implemented in the FactoMineR package, provides numerous numerical and graphical outputs. Confidence ellipses are also provided to validate and improve the representation of words and variables. Although this methodology was developed mainly to give an answer to the problem of analyzing openended questions, it can be applied to any kind of frequency/contingency table with external variables.
20160127T18:55:38Z
Kostov, Belchin Adriyanov
Bécue Bertaut, Mónica María
Husson, François
Correspondence analysis on generalised aggregated lexical tables (CAGALT) is a method that generalizes classical CAALT to the case of several quantitative, categorical and mixed variables. It aims to establish a typology of the external variables and a typology of the events from their mutual relationships. In order to do so, the influence of external variables on the lexical choices is untangled cancelling the associations among them, and to avoid the instability issued from multicollinearity, they are substituted by their principal components. The CaGalt function, implemented in the FactoMineR package, provides numerous numerical and graphical outputs. Confidence ellipses are also provided to validate and improve the representation of words and variables. Although this methodology was developed mainly to give an answer to the problem of analyzing openended questions, it can be applied to any kind of frequency/contingency table with external variables.

A robust framework for the estimation of dynamic OD trip matrices for reliable traffic management
http://hdl.handle.net/2117/82164
A robust framework for the estimation of dynamic OD trip matrices for reliable traffic management
Barceló Bugeda, Jaime; Montero Mercadé, Lídia
OriginDestination (OD) trip matrices describe the patterns of traffic behavior across the network and play a key role as primary data input to many traffic models. OD matrices are a critical requirement, either in static or dynamic models for traffic assignment. However, OD matrices are not yet directly observable; thus, the current practice consists of adjusting an initial or a priori matrix from link flow counts, speeds, travel times and other aggregate demand data. This information is provided by an existing layout of traffic counting stations, as the traditional loop detectors. The availability of new traffic measurements provided by ICT applications offers the possibility to formulate and develop more efficient algorithms, especially suited for realtime applications. However, the efficiently strongly depends, among other factor, on the quality of the seed matrix. This paper proposes an integrated computational framework in which an offline procedure generates the timesliced OD matrices, which are the input to an online estimator. The paper also analyzes the sensitivity of the online estimator with respect to the available traffic measurements
20160127T18:22:37Z
Barceló Bugeda, Jaime
Montero Mercadé, Lídia
OriginDestination (OD) trip matrices describe the patterns of traffic behavior across the network and play a key role as primary data input to many traffic models. OD matrices are a critical requirement, either in static or dynamic models for traffic assignment. However, OD matrices are not yet directly observable; thus, the current practice consists of adjusting an initial or a priori matrix from link flow counts, speeds, travel times and other aggregate demand data. This information is provided by an existing layout of traffic counting stations, as the traditional loop detectors. The availability of new traffic measurements provided by ICT applications offers the possibility to formulate and develop more efficient algorithms, especially suited for realtime applications. However, the efficiently strongly depends, among other factor, on the quality of the seed matrix. This paper proposes an integrated computational framework in which an offline procedure generates the timesliced OD matrices, which are the input to an online estimator. The paper also analyzes the sensitivity of the online estimator with respect to the available traffic measurements

On the collaboration uncapacitated arc routing problem
http://hdl.handle.net/2117/81939
On the collaboration uncapacitated arc routing problem
Fernández Aréizaga, Elena; Fontana, Dario; Speranza, M. Grazia
© 2015 Elsevier Ltd. All rights reserved.
This paper introduces a new arc routing problem for the optimization of a collaboration scheme among carriers. This yields to the study of a profitable uncapacitated arc routing problem with multiple depots, where carriers collaborate to improve the profit gained. In the first model the goal is the maximization of the total profit of the coalition of carriers, independently of the individual profit of each carrier. Then, a lower bound on the individual profit of each carrier is included. This lower bound may represent the profit of the carrier in the case no collaboration is implemented. The models are formulated as integer linear programs and solved through a branchandcut algorithm. Theoretical results, concerning the computational complexity, the impact of collaboration on profit and a game theoretical perspective, are provided. The models are tested on a set of 971 instances generated from 118 benchmark instances for the Privatized Rural Postman Problem, with up to 102 vertices. All the 971 instances are solved to optimality within few seconds.
20160125T09:33:18Z
Fernández Aréizaga, Elena
Fontana, Dario
Speranza, M. Grazia
© 2015 Elsevier Ltd. All rights reserved.
This paper introduces a new arc routing problem for the optimization of a collaboration scheme among carriers. This yields to the study of a profitable uncapacitated arc routing problem with multiple depots, where carriers collaborate to improve the profit gained. In the first model the goal is the maximization of the total profit of the coalition of carriers, independently of the individual profit of each carrier. Then, a lower bound on the individual profit of each carrier is included. This lower bound may represent the profit of the carrier in the case no collaboration is implemented. The models are formulated as integer linear programs and solved through a branchandcut algorithm. Theoretical results, concerning the computational complexity, the impact of collaboration on profit and a game theoretical perspective, are provided. The models are tested on a set of 971 instances generated from 118 benchmark instances for the Privatized Rural Postman Problem, with up to 102 vertices. All the 971 instances are solved to optimality within few seconds.

Correspondence analysis of textual data involving contextual information: CAGALT on principal components
http://hdl.handle.net/2117/81756
Correspondence analysis of textual data involving contextual information: CAGALT on principal components
Bécue Bertaut, Mónica María; Pages, Jerome
Correspondence analysis on an aggregated lexical table is a typical practice in textual analysis in which a contextual categorical variable is used to aggregate documents, depending on the categories to which they belong. This work generalises this approach and considers several quantitative, categorical or mixed contextual variables. The result is a new method that we have called 'correspondence analysis on a generalised aggregated lexical table'. A favoured application derives from surveys by questionnaire, including both openended and closed questions. The freetext answers are encoded into a respondents words frequency table called a lexical table. The closed questions, either quantitative or categorical, form the contextual variables. The primary objective is to establish a typology of the variables and a typology of the words from their mutual relationships as grasped from jointly analysing the textual and contextual tables. Validation tests are offered, particularly in the form of confidence ellipses. The comprehensive and numerous properties of the method, similar to correspondence analysis properties, are detailed. Promising results are obtained as indicated by an application to a marketing survey conducted among 1,000 respondents.
20160120T15:37:58Z
Bécue Bertaut, Mónica María
Pages, Jerome
Correspondence analysis on an aggregated lexical table is a typical practice in textual analysis in which a contextual categorical variable is used to aggregate documents, depending on the categories to which they belong. This work generalises this approach and considers several quantitative, categorical or mixed contextual variables. The result is a new method that we have called 'correspondence analysis on a generalised aggregated lexical table'. A favoured application derives from surveys by questionnaire, including both openended and closed questions. The freetext answers are encoded into a respondents words frequency table called a lexical table. The closed questions, either quantitative or categorical, form the contextual variables. The primary objective is to establish a typology of the variables and a typology of the words from their mutual relationships as grasped from jointly analysing the textual and contextual tables. Validation tests are offered, particularly in the form of confidence ellipses. The comprehensive and numerous properties of the method, similar to correspondence analysis properties, are detailed. Promising results are obtained as indicated by an application to a marketing survey conducted among 1,000 respondents.

The HLAC*04:01/KIR2DS4 gene combination and human leukocyte antigen alleles with high population frequency drive rate of HIV disease progression
http://hdl.handle.net/2117/81169
The HLAC*04:01/KIR2DS4 gene combination and human leukocyte antigen alleles with high population frequency drive rate of HIV disease progression
Olvera, Alex; Pérez Álvarez, Susana; Ibarrondo, Javier; Ganoza, Carmela; Lama, Javier R.; Lucchetti, Aldo; Cate, Steven; H. Hildebrand, William; Bernard, Nicole; Gómez Melis, Guadalupe; Sánchez, Jorge; Brander, Christian
Objective: The objective of this study is to identify human leukocyte antigen (HLA) class I and killercell immunoglobulinlike receptor (KIR) genotypes associated with different risks for HIV acquisition and HIV disease progression.; Design: A crosssectional study of a cohort of 468 highrisk individuals (246 HIVpositive and 222 HIVnegative) from outpatient clinics in Lima (Peru).; Methods: The cohort was highresolution HLA and KIRtyped and analysed for potential differences in singleallele frequencies and allele combinations between HIVpositive and HIVnegative individuals and for associations with HIV viral load and CD4(+) cell counts in infected individuals.; Results: HLA class I alleles associated with a lack of viral control had a significantly higher population frequency than relatively protective alleles (P = 0.0093), in line with a rare allele advantage. HLAA*02 : 01 and HLAC*04 : 01 were both associated with high viral loads (P = 0.0313 and 0.0001, respectively) and low CD4(+) cell counts (P = 0.0008 and 0.0087, respectively). Importantly, the association between HLAC*04 : 01 and poor viral control was not due to its linkage disequilibrium with other HLA alleles. Rather, the coexpression of its putative KIR ligand KIR2DS4f was critically linked to elevated viral loads.; Conclusion: These results highlight the impact of population allele frequency on viral control and identify a novel association between HLAC*04 : 01 in combination with KIR2DS4f and uncontrolled HIV infection. Our data further support the importance of the interplay of markers of the adaptive and innate immune system in viral control. Copyright (C) 2015 Wolters Kluwer Health, Inc. All rights reserved.
20160108T17:34:34Z
Olvera, Alex
Pérez Álvarez, Susana
Ibarrondo, Javier
Ganoza, Carmela
Lama, Javier R.
Lucchetti, Aldo
Cate, Steven
H. Hildebrand, William
Bernard, Nicole
Gómez Melis, Guadalupe
Sánchez, Jorge
Brander, Christian
Objective: The objective of this study is to identify human leukocyte antigen (HLA) class I and killercell immunoglobulinlike receptor (KIR) genotypes associated with different risks for HIV acquisition and HIV disease progression.; Design: A crosssectional study of a cohort of 468 highrisk individuals (246 HIVpositive and 222 HIVnegative) from outpatient clinics in Lima (Peru).; Methods: The cohort was highresolution HLA and KIRtyped and analysed for potential differences in singleallele frequencies and allele combinations between HIVpositive and HIVnegative individuals and for associations with HIV viral load and CD4(+) cell counts in infected individuals.; Results: HLA class I alleles associated with a lack of viral control had a significantly higher population frequency than relatively protective alleles (P = 0.0093), in line with a rare allele advantage. HLAA*02 : 01 and HLAC*04 : 01 were both associated with high viral loads (P = 0.0313 and 0.0001, respectively) and low CD4(+) cell counts (P = 0.0008 and 0.0087, respectively). Importantly, the association between HLAC*04 : 01 and poor viral control was not due to its linkage disequilibrium with other HLA alleles. Rather, the coexpression of its putative KIR ligand KIR2DS4f was critically linked to elevated viral loads.; Conclusion: These results highlight the impact of population allele frequency on viral control and identify a novel association between HLAC*04 : 01 in combination with KIR2DS4f and uncontrolled HIV infection. Our data further support the importance of the interplay of markers of the adaptive and innate immune system in viral control. Copyright (C) 2015 Wolters Kluwer Health, Inc. All rights reserved.

Mathematical programming approaches for classes of random network problems
http://hdl.handle.net/2117/81111
Mathematical programming approaches for classes of random network problems
Castro Pérez, Jordi; Nasini, Stefano
Random simulations from complicated combinatorial sets are often needed in many classes of stochastic problems. This is particularly true in the analysis of complex networks, where researchers are usually interested in assessing whether an observed network feature is expected to be found within families of networks under some hypothesis (named conditional random networks, i.e., networks satisfying some linear constraints). This work presents procedures to generate networks with specified structural properties which rely on the Solution of classes of integer optimization problems. We show that, for many of them, the constraints matrices are totally unimodular, allowing the efficient generation of conditional random networks by specialized interiorpoint methods. The computational results suggest that the proposed methods can represent a general framework for the efficient generation of random networks even beyond the models analyzed in this paper. This work also opens the posSibility for other applications of mathematical programming in the analysis of complex networks. (C) 2015 Elsevier B.V. All rights reserved.
20160107T16:32:22Z
Castro Pérez, Jordi
Nasini, Stefano
Random simulations from complicated combinatorial sets are often needed in many classes of stochastic problems. This is particularly true in the analysis of complex networks, where researchers are usually interested in assessing whether an observed network feature is expected to be found within families of networks under some hypothesis (named conditional random networks, i.e., networks satisfying some linear constraints). This work presents procedures to generate networks with specified structural properties which rely on the Solution of classes of integer optimization problems. We show that, for many of them, the constraints matrices are totally unimodular, allowing the efficient generation of conditional random networks by specialized interiorpoint methods. The computational results suggest that the proposed methods can represent a general framework for the efficient generation of random networks even beyond the models analyzed in this paper. This work also opens the posSibility for other applications of mathematical programming in the analysis of complex networks. (C) 2015 Elsevier B.V. All rights reserved.

The role of significance tests in consistent interpretation of nested partitions
http://hdl.handle.net/2117/80906
The role of significance tests in consistent interpretation of nested partitions
Gibert Oliveras, Karina; Sevilla Villanueva, Beatriz; Sánchez Marrè, Miquel
Cluster interpretation is an important step for a proper understanding of a set of classes, independently of whether they have been automatically discovered or expertbased. An understanding of classes is crucial for the further use of classes as the basis of a decisionmaking process.; The abundant work on cluster validity found in the literature is mainly focused on the validation of clusters from the structural point of view. However, structural validation does not ensure that the clustering is useful, since meaningfulness is the key to guaranteeing that classes can support further decisions. In previous works, special significance tests taken from the field of multivariate analysis were introduced in an interpretation methodology for automatically assessing relevant variables in particular classes.; In this paper, we present the interpretation of nested partitions and the relationships between both interpretations are studied. In particular, the inconsistencies produced in interpretation when a second partition refines the first one with a higher level of granularity are studied, diagnosed, and a modification of the original methodology is provided to guarantee consistency in these cases. The relevant characteristics detected in a parent class must also be inherited in subclasses, or at least in some of them.; The proposal is evaluated using a real data set on baseline health conditions and dietary habits of a sample of the general population. (C) 2015 Elsevier B.V. All rights reserved.; Cluster interpretation is an important step for a proper understanding of a set of classes, independently of whether they have been automatically discovered or expertbased. An understanding of classes is crucial for the further use of classes as the basis of a decisionmaking process.; The abundant work on cluster validity found in the literature is mainly focused on the validation of clusters from the structural point of view. However, structural validation does not ensure that the clustering is useful, since meaningfulness is the key to guaranteeing that classes can support further decisions. In previous works, special significance tests taken from the field of multivariate analysis were introduced in an interpretation methodology for automatically assessing relevant variables in particular classes.; In this paper, we present the interpretation of nested partitions and the relationships between both interpretations are studied. In particular, the inconsistencies produced in interpretation when a second partition refines the first one with a higher level of granularity are studied, diagnosed, and a modification of the original methodology is provided to guarantee consistency in these cases. The relevant characteristics detected in a parent class must also be inherited in subclasses, or at least in some of them.; The proposal is evaluated using a real data set on baseline health conditions and dietary habits of a sample of the general population. (C) 2015 Elsevier B.V. All rights reserved.
20151218T12:37:00Z
Gibert Oliveras, Karina
Sevilla Villanueva, Beatriz
Sánchez Marrè, Miquel
Cluster interpretation is an important step for a proper understanding of a set of classes, independently of whether they have been automatically discovered or expertbased. An understanding of classes is crucial for the further use of classes as the basis of a decisionmaking process.; The abundant work on cluster validity found in the literature is mainly focused on the validation of clusters from the structural point of view. However, structural validation does not ensure that the clustering is useful, since meaningfulness is the key to guaranteeing that classes can support further decisions. In previous works, special significance tests taken from the field of multivariate analysis were introduced in an interpretation methodology for automatically assessing relevant variables in particular classes.; In this paper, we present the interpretation of nested partitions and the relationships between both interpretations are studied. In particular, the inconsistencies produced in interpretation when a second partition refines the first one with a higher level of granularity are studied, diagnosed, and a modification of the original methodology is provided to guarantee consistency in these cases. The relevant characteristics detected in a parent class must also be inherited in subclasses, or at least in some of them.; The proposal is evaluated using a real data set on baseline health conditions and dietary habits of a sample of the general population. (C) 2015 Elsevier B.V. All rights reserved.
Cluster interpretation is an important step for a proper understanding of a set of classes, independently of whether they have been automatically discovered or expertbased. An understanding of classes is crucial for the further use of classes as the basis of a decisionmaking process.; The abundant work on cluster validity found in the literature is mainly focused on the validation of clusters from the structural point of view. However, structural validation does not ensure that the clustering is useful, since meaningfulness is the key to guaranteeing that classes can support further decisions. In previous works, special significance tests taken from the field of multivariate analysis were introduced in an interpretation methodology for automatically assessing relevant variables in particular classes.; In this paper, we present the interpretation of nested partitions and the relationships between both interpretations are studied. In particular, the inconsistencies produced in interpretation when a second partition refines the first one with a higher level of granularity are studied, diagnosed, and a modification of the original methodology is provided to guarantee consistency in these cases. The relevant characteristics detected in a parent class must also be inherited in subclasses, or at least in some of them.; The proposal is evaluated using a real data set on baseline health conditions and dietary habits of a sample of the general population. (C) 2015 Elsevier B.V. All rights reserved.

Transforming classic discrete event system specification models to specification and description language
http://hdl.handle.net/2117/80899
Transforming classic discrete event system specification models to specification and description language
Fonseca Casas, Pau
Discrete Event System Specification (DEVS) is one of the main widely used formal languages to represent simulation models, while Specification and Description Language (SDL) is a graphical ITUT standard language, commonly used in telecommunication and engineering areas. In this paper, we present an algorithm, and a simulation infrastructure that implements this algorithm, to transform a simulation model represented using the DEVS formalism to the SDL standard language. The algorithm can be viewed as a mechanism to represent graphically DEVS models. In addition, because of the transformation, one can use SDL tools in order to implement DEVS models. To implement the algorithm, we propose an Extensible Markup Language representation for the DEVS and SDL models. For practical application of the algorithm, it is implemented in a simulation infrastructure named the Specification and Description Language Parallel Simulator that allows defining the models with both formalisms.
20151218T11:34:14Z
Fonseca Casas, Pau
Discrete Event System Specification (DEVS) is one of the main widely used formal languages to represent simulation models, while Specification and Description Language (SDL) is a graphical ITUT standard language, commonly used in telecommunication and engineering areas. In this paper, we present an algorithm, and a simulation infrastructure that implements this algorithm, to transform a simulation model represented using the DEVS formalism to the SDL standard language. The algorithm can be viewed as a mechanism to represent graphically DEVS models. In addition, because of the transformation, one can use SDL tools in order to implement DEVS models. To implement the algorithm, we propose an Extensible Markup Language representation for the DEVS and SDL models. For practical application of the algorithm, it is implemented in a simulation infrastructure named the Specification and Description Language Parallel Simulator that allows defining the models with both formalisms.