Use of survey weights for the analysis of compositional data: some simulation results
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/366826
Tipus de documentText en actes de congrés
Data publicació2011
EditorCIMNE
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
The compositional space can be seen as a vector space, where the vector addition corresponds to
perturbation and the multiplication by a scalar corresponds to powering (Aitchison, 1986; PawlowskyGlahn and Egozcue, 2001). Whereas perturbation is a widely used operation in applications of compositional analysis, powering is somewhat neglected. Survey data analysis on the other hand is a domain
of applied statistics where the use of weights is predominant. The reason for introducing weights in
survey data analysis is threefold: 1. the use of complex survey designs with unequal inclusion probabilities, 2. the correction of non-response, and 3. calibration procedures. We shall introduce briefly
the rationale for weights in survey analysis and then discuss the connection between survey weights
and the powering operation. Several examples will be given.
Surveys are essentially built to optimize the estimation of totals in population subgroups for a
number of variables. Practically, a key variable is chosen and the design is optimized for this variable,
the trade-off being between cost and precision. Totals are estimated by weighted sums of the sampled
values. The weights are extrapolation factors that depend on the survey design. It is an important
aspect of the data quality to inform the user on the measurement error of the published figures. Survey
design and estimation are described e.g. in S¨arndal, Swensson and Wretman (1992).
In a survey context, the interest is taken in totals or means across cases, but in a compositional
context, totals have no meaning. So if we want to average cases, we have to go back to the original
measurement scale and then make the closure operation. For the geometric mean composition on the
contrary, the result is the same, whether the amounts are averaged first and then a average composition
is computed, or whether the geometric mean of the compositions is computed directly and then closed.
The design-based approach does not make any assumptions on the distribution of compositions.
This opens the way to parametrization by general partitions (Aitchison, 1986, section 2.7) without
the drawback of ad hoc assumptions on multivariate normality (Aitchison, 1986, definition 6.7). In
household expenditure surveys for instance, a hierarchy of commodities with broad categories are subdivided into more detailed goods. A general partition can follow this organization and may be a more
convenient way to convey the information on the surveyed units. The joint probability distribution
of transforms of this general partition is derived from the distribution of the sample inclusion indicator.
After a brief review of survey methodology, we apply the design-based principles to the estimation
of compositions, of compositional transforms and of their covariance matrix on a small population.
The properties of the estimators will be investigated by simulation. The talk will end with a discussion.
CitacióGraf, M. Use of survey weights for the analysis of compositional data: some simulation results. A: CODAWORK 2011. "Proceedings of CoDaWork'11: 4th international workshop on Compositional Data Analysis, Egozcue, J.J., Tolosana-Delgado, R. and Ortego, M.I. (eds.) 2011". Barcelona: CIMNE, 2011, ISBN 978-84-87867-76-7.
ISBN978-84-87867-76-7
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
p35-CoDaWork2011.pdf | 582,7Kb | Visualitza/Obre |