Prediction of tropospheric ozone concentration at urban locations using machine learning algorithms. Application to Barcelona, Spain

Author's e-mailsergio.lc0603
gmail.com

Document typeMaster thesis
Date2021-09-09
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
In the last decades, the interest in predicting tropospheric ozone levels (O₃) has increased due to its detrimental effect on population health and vegetation. Although certain factors such as solar radiation are well known to have an influence on ozone levels, the effect of other variables is less clear. In this study, several regression models based on the Random Forest (RF) algorithm are generated to predict the daily maximum hourly ozone concentration level (1hO₃) and the daily maximum 8-hours average ozone concentration level (8hO₃) one day ahead in Barcelona, using air quality data, meteorological data and time variables as inputs. Two versions of the model are considered: taking information from the whole year and focusing only on summer months (May to September). In addition, classification models are created, based on thresholds inspired by current regulations for both outputs. RF regression models capture the time variation of tropospheric ozone through the year and they generate accurate estimations with acceptable deviation between the observations and predictions. In general, the categorical models of 1hO₃ show suitable and lower error rates than 8hO₃. However, the categories, which gather the most of the tropospheric ozone values have high accuracy and the categories with few values inside them have low accuracy. Consequently, these categorical models are not useful as a tool to alert the population about a specific ozone event. The analysis of RF models shows that the tropospheric ozone level (1hO₃ or 8hO₃ according to the model) of the previous day to the prediction has the strongest association to the output. The importance of other inputs varies between the models considered; while solar radiation and day of the year are the main variables after O₃ for the whole year models, relative humidity, average dew-point deficit and weekday are also relevant in the summer models.
DegreeMÀSTER UNIVERSITARI ERASMUS MUNDUS EN HIDROINFORMÀTICA I GESTIÓ DE L'AIGUA (Pla 2009)
Files | Description | Size | Format | View |
---|---|---|---|---|
Sergio_Lopez_Chacon_Master_Thesis.pdf | 3,501Mb | View/Open |