SORT (Statistics and Operations Research Transactions)
http://hdl.handle.net/2099/3717
2020-02-19T14:10:52ZA quadtree approach based on European geographic grids: reconciling data privacy and accuracy
http://hdl.handle.net/2117/112845
A quadtree approach based on European geographic grids: reconciling data privacy and accuracy
Lagonigro, Raymond; Oller, Ramon; Martori, Joan Carles
Methods to preserve confidentiality when publishing geographic information conflict with the need to publish accurate data. The goal of this paper is to create a European geographic grid frame- work to disseminate statistical data over maps. We propose a methodology based on quadtree hierarchical geographic data structures. We create a varying size grid adapted to local area densities. High populated zones are disaggregated in small squares to allow dissemination of accurate data. Alternatively, information on low populated zones is published in big squares to avoid identification of individual data. The methodology has been applied to the 2014 population register data in Catalonia
2018-01-16T14:31:26ZLagonigro, RaymondOller, RamonMartori, Joan CarlesMethods to preserve confidentiality when publishing geographic information conflict with the need to publish accurate data. The goal of this paper is to create a European geographic grid frame- work to disseminate statistical data over maps. We propose a methodology based on quadtree hierarchical geographic data structures. We create a varying size grid adapted to local area densities. High populated zones are disaggregated in small squares to allow dissemination of accurate data. Alternatively, information on low populated zones is published in big squares to avoid identification of individual data. The methodology has been applied to the 2014 population register data in CataloniaStatistical modeling of warm-spell duration series using hurdle models
http://hdl.handle.net/2117/112758
Statistical modeling of warm-spell duration series using hurdle models
Rydén, Jesper
Regression models for counts could be applied to the earth sciences, for instance when studying trends of extremes of climatological quantities. Hurdle models are modified count models which can be regarded as mixtures of distributions. In this paper, hurdle models are applied to model the sums of lengths of periods of high temperatures. A modification to the common versions presented in the literature is presented, as left truncation as well as a particular treatment of zeros is needed for the problem. The outcome of the model is compared to those of simpler count models.
2018-01-12T15:53:39ZRydén, JesperRegression models for counts could be applied to the earth sciences, for instance when studying trends of extremes of climatological quantities. Hurdle models are modified count models which can be regarded as mixtures of distributions. In this paper, hurdle models are applied to model the sums of lengths of periods of high temperatures. A modification to the common versions presented in the literature is presented, as left truncation as well as a particular treatment of zeros is needed for the problem. The outcome of the model is compared to those of simpler count models.A Bayesian stochastic SIRS model with a vaccination strategy for the analysis of respiratory syncytial virus
http://hdl.handle.net/2117/112757
A Bayesian stochastic SIRS model with a vaccination strategy for the analysis of respiratory syncytial virus
Jornet-Sanz, Marc; Corberán-Vallet, Ana; Santonja, Francisco; Villanueva, Rafael
Our objective in this paper is to model the dynamics of respiratory syncytial virus in the region of Valencia (Spain) and analyse the effect of vaccination strategies from a health-economic point of view. Compartmental mathematical models based on differential equations are commonly used in epidemiology to both understand the underlying mechanisms that influence disease transmission and analyse the impact of vaccination programs. However, a recently proposed Bayesian stochastic susceptible-infected-recovered-susceptible model in discrete-time provided an improved and more natural description of disease dynamics. In this work, we propose an extension of that stochastic model that allows us to simulate and assess the effect of a vaccination strategy that consists on vaccinating a proportion of newborns.
2018-01-12T15:52:25ZJornet-Sanz, MarcCorberán-Vallet, AnaSantonja, FranciscoVillanueva, RafaelOur objective in this paper is to model the dynamics of respiratory syncytial virus in the region of Valencia (Spain) and analyse the effect of vaccination strategies from a health-economic point of view. Compartmental mathematical models based on differential equations are commonly used in epidemiology to both understand the underlying mechanisms that influence disease transmission and analyse the impact of vaccination programs. However, a recently proposed Bayesian stochastic susceptible-infected-recovered-susceptible model in discrete-time provided an improved and more natural description of disease dynamics. In this work, we propose an extension of that stochastic model that allows us to simulate and assess the effect of a vaccination strategy that consists on vaccinating a proportion of newborns.Goodness-of-fit test for randomly censored data based on maximum correlation
http://hdl.handle.net/2117/112756
Goodness-of-fit test for randomly censored data based on maximum correlation
Strzalkowska-Kominiak, Ewa; Grané, Aurea
In this paper we study a goodness-of-fit test based on the maximum correlation coefficient, in the context of randomly censored data. We construct a new test statistic under general right- censoring and prove its asymptotic properties. Additionally, we study a special case, when the censoring mechanism follows the well-known Koziol-Green model. We present an extensive simulation study on the empirical power of these two versions of the test statistic, showing their ad- vantages over the widely used Pearson-type test. Finally, we apply our test to the head-and-neck cancer data.
2018-01-12T15:49:26ZStrzalkowska-Kominiak, EwaGrané, AureaIn this paper we study a goodness-of-fit test based on the maximum correlation coefficient, in the context of randomly censored data. We construct a new test statistic under general right- censoring and prove its asymptotic properties. Additionally, we study a special case, when the censoring mechanism follows the well-known Koziol-Green model. We present an extensive simulation study on the empirical power of these two versions of the test statistic, showing their ad- vantages over the widely used Pearson-type test. Finally, we apply our test to the head-and-neck cancer data.Corrigendum to "Transmuted geometric distribution with applications in modelling and regression analysis of count data
http://hdl.handle.net/2117/112755
Corrigendum to "Transmuted geometric distribution with applications in modelling and regression analysis of count data
Chakraborty, Subrata; Bhati, Deepesh
2018-01-12T15:47:00ZChakraborty, SubrataBhati, DeepeshBayesian correlated models for assessing the prevalence of viruses in organic and non-organic agroecosystems
http://hdl.handle.net/2117/112754
Bayesian correlated models for assessing the prevalence of viruses in organic and non-organic agroecosystems
Lázaro, Elena; Armero, Carmen; Rubio, Luis
Cultivation of horticultural species under organic management has increased in importance in recent years. However, the sustainability of this new production method needs to be supported by scientific research, especially in the field of virology. We studied the prevalence of three important virus diseases in agroecosystems with regard to its management system: organic
2018-01-12T15:45:39ZLázaro, ElenaArmero, CarmenRubio, LuisCultivation of horticultural species under organic management has increased in importance in recent years. However, the sustainability of this new production method needs to be supported by scientific research, especially in the field of virology. We studied the prevalence of three important virus diseases in agroecosystems with regard to its management system: organicComparison of two discrimination indexes in the categorisation of continuous predictors in time-to-event studies
http://hdl.handle.net/2117/112753
Comparison of two discrimination indexes in the categorisation of continuous predictors in time-to-event studies
Barrio, Irantzu; Rodríguez-Álvarez, María Xosé; Meira-Machado, Luis; Esteban, Cristóbal; Arostegui, Inmaculada
The Cox proportionalhazards model is the most widely used su
rvival prediction model for analysing
time-to-event data. To measure the discrimination ability
of a survival model the concordance
probability index is widely used. In this work we studied and
compared the performance of two
different estimators of the concordance probability when a
continuous predictor variable is cate-
gorised in a Cox proportional hazards regression model. In p
articular, we compared the c-index
and the concordance probability estimator. We evaluated th
e empirical performance of both es-
timators through simulations. To categorise the predictor
variable we propose a methodology
which considers the maximal discrimination attained for th
e categorical variable. We applied this
methodology to a cohort of patients with chronic obstructiv
e pulmonary disease, in particular, we
categorised the predictor variable forced expiratory volu
me in one second in percentage
2018-01-12T15:44:36ZBarrio, IrantzuRodríguez-Álvarez, María XoséMeira-Machado, LuisEsteban, CristóbalArostegui, InmaculadaThe Cox proportionalhazards model is the most widely used su
rvival prediction model for analysing
time-to-event data. To measure the discrimination ability
of a survival model the concordance
probability index is widely used. In this work we studied and
compared the performance of two
different estimators of the concordance probability when a
continuous predictor variable is cate-
gorised in a Cox proportional hazards regression model. In p
articular, we compared the c-index
and the concordance probability estimator. We evaluated th
e empirical performance of both es-
timators through simulations. To categorise the predictor
variable we propose a methodology
which considers the maximal discrimination attained for th
e categorical variable. We applied this
methodology to a cohort of patients with chronic obstructiv
e pulmonary disease, in particular, we
categorised the predictor variable forced expiratory volu
me in one second in percentageOn a property of Lorenz curves with monotone elasticity and its application to the study of inequality by using tax data
http://hdl.handle.net/2117/112752
On a property of Lorenz curves with monotone elasticity and its application to the study of inequality by using tax data
Sordo, Miguel A.; Berihuete, Angel; Ramos, Carmen Dolores; Ramos, Héctor M.
The Lorenz curve is the most widely used graphical tool for describing and comparing inequality of income distributions. In this paper, we show that the elasticity of this curve is an indicator of the effect, in terms of inequality, of a truncation of the income distribution. As an application, we consider tax returns as equivalent to the truncation from below of a hypothetical income distribution. Then, we replace this hypothetical distribution by the income distribution obtained from a general household survey and use the dual Lorenz curve to anticipate this effect.
2018-01-12T15:43:36ZSordo, Miguel A.Berihuete, AngelRamos, Carmen DoloresRamos, Héctor M.The Lorenz curve is the most widely used graphical tool for describing and comparing inequality of income distributions. In this paper, we show that the elasticity of this curve is an indicator of the effect, in terms of inequality, of a truncation of the income distribution. As an application, we consider tax returns as equivalent to the truncation from below of a hypothetical income distribution. Then, we replace this hypothetical distribution by the income distribution obtained from a general household survey and use the dual Lorenz curve to anticipate this effect.Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling
http://hdl.handle.net/2117/112751
Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling
Chao, Anne; Colwell, Robert K.
In the context of capture-recapture studies, Chao (1987) de
rived an inequality among capture frequency
counts to obtain a lower bound for the size of a population bas
ed on individuals’ capture/non-capture
records for multiple capture occasions. The inequality has
been applied to obtain a non-parametric
lower bound of species richness of an assemblage based on spe
cies incidence (detection/non-detection)
data in multiple sampling units. The inequality implies tha
t the number of undetected species can be
inferred from the species incidence frequency counts of the
uniques (species detected in only one
sampling unit) and duplicates (species detected in exactly
two sampling units). In their pioneering pa-
per, Colwell and Coddington (1994) gave the name “Chao2” to t
he estimator for the resulting species
richness. (The “Chao1” estimator refers to a similar type of
estimator based on species abundance
data). Since then, the Chao2 estimator has been applied to ma
ny research fields and led to fruitful
generalizations. Here, we first review Chao’s inequality un
der various models and discuss some re-
lated statistical inference questions: (1) Under what cond
itions is the Chao2 estimator an unbiased
point estimator? (2) How many additional sampling units are
needed to detect any arbitrary proportion
(including 100%) of the Chao2 estimate of asymptotic specie
s richness? (3) Can other incidence fre-
quency counts be used to obtain similar lower bounds? We then
show how the Chao2 estimator can be
also used to guide a non-asymptotic analysis in which specie
s richness estimators can be compared
for equally-large or equally-complete samples via sample-
size-based and coverage-based rarefaction
and extrapolation. We also review the generalization of Cha
o’s inequality to estimate species richness
under other sampling-without-replacement schemes (e.g. a
set of quadrats, each surveyed only once),
to obtain a lower bound of undetected species shared between
two or multiple assemblages, and to
allow inferences about undetected phylogenetic richness (
the total length of undetected branches of a
phylogenetic tree connecting all species), with associate
d rarefaction and extrapolation. A small empir-
ical dataset for Australian birds is used for illustration,
using online software SpadeR, iNEXT, and PhD
2018-01-12T15:42:46ZChao, AnneColwell, Robert K.In the context of capture-recapture studies, Chao (1987) de
rived an inequality among capture frequency
counts to obtain a lower bound for the size of a population bas
ed on individuals’ capture/non-capture
records for multiple capture occasions. The inequality has
been applied to obtain a non-parametric
lower bound of species richness of an assemblage based on spe
cies incidence (detection/non-detection)
data in multiple sampling units. The inequality implies tha
t the number of undetected species can be
inferred from the species incidence frequency counts of the
uniques (species detected in only one
sampling unit) and duplicates (species detected in exactly
two sampling units). In their pioneering pa-
per, Colwell and Coddington (1994) gave the name “Chao2” to t
he estimator for the resulting species
richness. (The “Chao1” estimator refers to a similar type of
estimator based on species abundance
data). Since then, the Chao2 estimator has been applied to ma
ny research fields and led to fruitful
generalizations. Here, we first review Chao’s inequality un
der various models and discuss some re-
lated statistical inference questions: (1) Under what cond
itions is the Chao2 estimator an unbiased
point estimator? (2) How many additional sampling units are
needed to detect any arbitrary proportion
(including 100%) of the Chao2 estimate of asymptotic specie
s richness? (3) Can other incidence fre-
quency counts be used to obtain similar lower bounds? We then
show how the Chao2 estimator can be
also used to guide a non-asymptotic analysis in which specie
s richness estimators can be compared
for equally-large or equally-complete samples via sample-
size-based and coverage-based rarefaction
and extrapolation. We also review the generalization of Cha
o’s inequality to estimate species richness
under other sampling-without-replacement schemes (e.g. a
set of quadrats, each surveyed only once),
to obtain a lower bound of undetected species shared between
two or multiple assemblages, and to
allow inferences about undetected phylogenetic richness (
the total length of undetected branches of a
phylogenetic tree connecting all species), with associate
d rarefaction and extrapolation. A small empir-
ical dataset for Australian birds is used for illustration,
using online software SpadeR, iNEXT, and PhDSmoothed landmark estimators of the transition probabilities
http://hdl.handle.net/2117/112750
Smoothed landmark estimators of the transition probabilities
Meira-Machado, Luís
2018-01-12T15:32:03ZMeira-Machado, Luís