Big Data and Data Analytics applied to the Monitoring of Water Distribution Networks
Correu electrònic de l'autormireya.mubgmail.com
Tutor / director / avaluadorPuig Cayuela, Vicenç
Tipus de documentProjecte Final de Màster Oficial
Condicions d'accésAccés obert
The project associated with this master thesis was performed in collaboration with Water Technological Centre CETAQUA1, located in Cornellà de Llobregat. The aim of this thesis is to implement the use of data analytics and machine learning models for demand monitoring in water networks using data collected from Automatic Meter Reading (AMR). With this, the scope of the project intends to analyze the influence of external variables on the consumption pattern. Starting from the knowledge of the state of the art in the existing Big Data techniques and tools, the most adequate for water distribution networks monitoring will be chosen. Three different datasets were processed for the purpose of the study, two of these corresponding to the city of Tarragona, and the remainder to the city of Torremolinos; databases including the variables linked to the consumer account associated to the meters were also used. All of the datasets and databases were provided by the company. The study consisted of two stages, in the first one the datasets were split by season and analyzed separately to evaluate the features presented in each one. To evaluate the representative behavior for each city, the clustering labels were analyzed to find the groups of sensors who share the same pattern in behavior. In the second stage three different models were applied to the data to find the relation between the demand patterns and the meter account variables. The results reveal a symbolic number of groups of sensors that follow the same behavior in the seasonal analysis; outlier activity associated to high consumer and non-domestic use was also detected. The results obtained in the second stage suggest a forced input-output relation among the meter account variables and the clustered patterns; these results improve when combining these variables with features associated with the demand pattern. Some of the drawbacks during the execution of the project were the untrustworthiness of some predictors, as well as the loss of information due to outlier extraction or missing data.