WASH your data off: navigating statistical uncertainty in compositional data analysis
Document typeConference report
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
International monitoring of access to drinking water, sanitation and hygiene (WASH) is essential to inform policy planning, implementation and delivery of services. The Joint Monitoring Programme for Water Supply and Sanitation (JMP) is the recognized mechanism for tracking access and progress, and it is based on household surveys and linear regression modelling over time. However, the methods employed have two substantial limitations: they do not address the compositional nature of the data, nor its statistical uncertainty (Ezbakhe & Pérez-Foguet 2018). While the first issue has been tackled previously in the literature (Pérez-Foguet et al. 2017), the effect of non-uniform sampling errors on the regressions remains ignored. This article aims to address these shortcomings in order to produce a more truthful interpretation of JMP data. The main challenge we try to overcome is how to translate the sampling errors provided in household surveys to the space of compositional data. A Normal distribution is commonly assumed for estimates in household surveys, with a mean and its standard deviation. However, when working with binary data on households - the proportions of households that have access to WASH services - the errors cannot follow normal distributions due to the domain restrictions of proportions, limited to the range 0 to 1. Thus, the Beta distributions seems a better option to characterize the uncertainty around mean access coverage. Yet, as the Beta distribution is defined on the [0,1] interval, the zero values must be dealt with in order to employ the isometric log-ratio (ilr) transformation designed for compositional data. In this article, we investigate the use of two probability distributions (Pearson Type I and Truncated Normal) and Monte Carlo simulations to reinterpret the error in the JMP data so that compositional data analysis is possible. With a specific focus on the WASH sector, our article shows that the importance of including the survey errors of the data - and its compositional nature - when using this information to support evidence-based policy-making. Indeed, given the current levels of statistical uncertainty in WASH, data may lead to misleading results if errors are not acknowledged (or minimized).
CitationEzbakhe, F.; Pérez-Foguet, A. WASH your data off: navigating statistical uncertainty in compositional data analysis. A: International Workshop on Compositional Data Analysis. "Proceedings of the 8th International Workshop on Compositional Data Analysis (CoDaWork2019): Terrassa, 3-8 June, 2019". 2019, p. 57-62.