COVID-19 Flow-Maps: An open geographic information system on COVID-19 and human mobility for Spain

COVID-19 is an infectious disease caused by the SARS-CoV-2 virus, which has spread all over the world leading to a global pandemic. The fast progression of COVID-19 has been mainly related to the high contagion rate of the virus and the worldwide mobility of humans. In the absence of pharmacological therapies, governments from different countries have introduced several non-pharmaceutical interventions to reduce human mobility and social contact. Several studies based on Anonymised Mobile Phone Data have been published analysing the relationship between human mobility and the spread of coronavirus. However, to our knowledge, none of these data-sets integrates cross-referenced geo-localised data on human mobility and COVID-19 cases into one all-inclusive open resource. Herein we present COVID-19 Flow-Maps, a cross-referenced Geographic Information System that integrates regularly updated time-series accounting for population mobility and daily reports of COVID-19 cases in Spain at different scales of time spatial resolution. This integrated and up-to-date data-set can be used to analyse the human dynamics to guide and support the design of more effective non-pharmaceutical interventions.

for a transfer at a bus/train station). The result of this step is the sequence of activities and trips made on the 139 analysed days. The information associated with each activity includes the location, the start time and the end 140 time. As associated information, each trip includes origin (location of the activity immediately prior to the trip),  The extrapolation of the sample is carried out by taking as the sampling reference the population residing in 145 the country, according to the data from the Population Register provided by the National Institute of Statistics. 146 Standard sample extrapolation procedures are used (similar to those used, for example, in a household mobility 147 survey), applying expansion factors by area of residence based on the sample/population ratio for each age 148 and gender segment in each district. Finally, the information obtained in the previous steps is presented with 149 the required spatial and temporal resolution and the desired segmentation to generate the origin-destination 150 matrices and the rest of the mobility indicators previously described. The generation of the mobility indicators 151 has been carried out using specialised software developed for this purpose and all these processes are carried 152 out within the mobile operator's infrastructure, so that the information generated and delivered to our source, 153 the MITMA official site for data release, is already aggregated and anonymized. 154 The output obtained from the processing steps is then used for generating two mobility indicators: the hourly-155 based OD matrices referred to as the Maestra 1, and the daily population mobility descriptions referred 156 as the Maestra 2 (see Mobility data in the Data Records section). Both indicators are geo-referenced to a 157 custom layer referred to as the MITMA mobility layer (see Geographical Layers in the Data Records section). 158 Further details on the analysis and processing of mobility data are provided on the official page of the study 159 (https://www.mitma.gob.es/ministerio/covid-19/evolucion-movilidad-big-data). 160 Mobility indicators are further processed to obtain more aggregated mobility metrics. Hourly-based OD 161 matrices (Maestra 1 ) are aggregated to obtain daily mobility matrices. Furthermore, daily population mobility 162 indicators (Maestra 2 ) that provide the number of people that have done zero, one, two, and more than two 163 trips, are aggregated within each mobility area in a given date to estimate the total population of each mobility 164 area on a daily basis. The total population computed in this way is stored to be used in the calculation of other 165 descriptors (e.g. daily incidence by total population). The population inferred from Maestra 2 is estimated 166 on a daily basis and thus it captures the fluctuations in the population due to net mobility over the year (e.g. 167 mobility during summer vacations).

169
The database has all the information needed to identify the origin of the data, all the processing carried out, the 170 original files retrieved, and the timestamp of the last update. Furthermore, copies of all the data obtained from 171 the different sources are kept in their original formats and their source URL (if available). All this information 172 can also be queried through the REST-API (see REST-API in the Usage Notes section).

173
Data projection using geographical layers overlay 174 In order to combine COVID-19 daily incidence and population mobility data, both data records should be 175 projected onto the same geographical layer. In some cases, one Polygons in one layer must be covered by a 176 single polygon from another layer, with an exclusive overlap (e.g. municipalities are included in provinces). In 177 other cases, the overlap between the two layers is not exclusive, which means that polygons in one layer can 178 be covered by more than one polygon from another layer. For instance, COVID-19 daily incidences and mobility 179 4/24 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2021. ; https://doi.org/10.1101/2021.06.23.21259395 doi: medRxiv preprint data are geo-referenced into different geographical layers that cannot be combined directly. Thus, to overcome 180 this issue, we have implemented a general approach to project data among different geographical layers. The 181 approach is based on linear interpolation over the overlaying areas between the polygons from the two layers. 182 We call this process "projecting" data from a source layer (e.g. municipalities) into the target layer (e.g. BHAs). 184 In general, a geographical layer is composed of several different polygons. For example, in the case of the 185 province layer, each individual province will correspond to a polygon that defines its geographic frontier or

Population-based overlay matrix 190
Using Spatial-based overlays has the implicit assumption that the population is distributed uniformly on the 191 territory. Nonetheless, this assumption may not hold in most cases, and therefore, some authors have proposed

206
Data projection using the overlay matrix 207 Given a row vector V A of data on layer A (e.g. COVID-19 incidence value for each polygon of layer A), this data 208 can be projected into the layer B by just multiplying V A and the overlay matrix:  The same approach is also used to project an OD mobility matrix between geographical layers. Projecting

5/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2021. ; https://doi.org/10.1101/2021.06.23.21259395 doi: medRxiv preprint matrices for all the combinations of geographical layers have been computed and stored in the database, 220 enabling a fast projection of any data-set between the different geographical layers.

222
To assess the effect of population mobility on the spreading of COVID-19, we have developed a risk score 223 named Mobility Associated Risk (MAR). The MAR score integrates daily cases with populations flows between 224 different geographical areas (i.e. OD matrices), e.g. provinces or BHAs, to estimate how likely it is that cases 225 spread as a consequence of human mobility (see Figure 3). Herein, we use the incidence accumulated over 226 two weeks as an estimator of the number of active cases. Then, for a given geographical layer L with n zones 227 j := 1...n, and a given date t, we refer to the cases accumulated over two weeks as I 14 j (t). This estimator is then 228 normalised to the total population reported in zone j at the same date t. i 14 where N k (t) is the total population in zone j at date t and i j (t) is the estimator of the active cases per total 230 number of inhabitants. We then combine the i j (t) from each zone j together with a daily OD mobility matrix for 231 the date t as follows: where M j,k (t) is the number of trips from j to k with both values reported at date t, and R j,k (t) is an estimation 233 of the expected number of infected subjects also moving from zone j to k at day t. In general, when the 234 daily COVID-19 cases and the mobility are reported in the same geographical layer L, the risk R j,k (t) can be 235 calculated for all pairs of zones j and k by the element-wise, or Hadamard product between the nx1 vector 236 i(t) = [i 14 1 (t) · · · i 14 n (t)] T of cases densities and the nxn mobility matrix M(t): The matrix R(t) is thus a directed weighted network where the nodes correspond to the different n zones from 238 layer L and the flow R j,k (t) between any pair of nodes ( j, k) is the estimated number of infected subjects moving 239 from the source j to target k at t. The network-based structure of the mobility associated risk allows the 240 definition of the total MAR incoming to zone k by summing the risk over all possible sources (i.e. summing the 241 kth column of R(t)). In a similar way, that outgoing MAR for a given zone k corresponds to all the weighted 242 edges having j as the source node (i.e. summing the kth row of R(t)).

243
The risk network can be represented in the map to analyse the source of incoming MAR to any zone of interest.

244
For instance, the risk network R(t) can be calculated between provinces by combining daily incidence reported 245 at province level together with mobility data aggregated at the same level. Finally, we would like to stress that although we are aware that the MAR score is a raw approximation (e.g.

252
infected cases in quarantine are not expected to travel with the same frequency as healthy people does), we 253 find it can be used as an approach to evaluate the risk of outbreaks in different zones due to imported cases. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2021. ; https://doi.org/10.1101/2021.06.23.21259395 doi: medRxiv preprint the mobility data-set, the events are geo-referenced to mobility zones from a custom layer that is provided . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2021. ; https://doi.org/10.1101/2021.06.23.21259395 doi: medRxiv preprint (daily incidence) as well as the number of cases segregated by test type, i.e. PCR, antigen, antibody, ELISA or 304 unknown. Additionally, information on the total daily hospitalisations and admissions into intensive healthcare 305 units, reported by provinces, are also provided.

306
This data record also includes COVID-19 cases at a higher spatial resolution (e.g. municipalities) reported 307 by several autonomous communities. Currently, eight out of the nineteen autonomous communities publish 308 reports with local daily COVID-19 cases at the level of municipalities or BHAs. On the one hand, Castilla y 309 León, Madrid, Navarra and País Vasco report COVID-19 cases at the level of local BHAs; on the other hand 310 Asturias, Navarra and Valencia report daily access at the level of municipalities; Cataluña local government 311 reports COVID-19 daily cases at the level of BHA as well as municipalities. Table 2 shows the different 312 COVID-19 data-sets that are integrated into this record, together with the associated geographical layer in 313 which the data is reported. In this way, each entry reporting COVID-19 cases includes the reporting date, the 314 geographical layer and identifier of the specific polygon within the layer and the number of cases reported, 315 reported as daily and cumulative incidence. Additionally, each entry also includes useful metrics, such as the 316 rolling average of the daily incidence over one and two weeks, the population in that area and the number of 317 cases per 100,000 inhabitants. More detailed information regarding the specific data fields reported by each 318 different source is provided in Online Table 3. The released data can be retrieved using the following link:  All the information is stored in non-SQL database that can be directly queried through a REST-API, downloaded using provided scripts, and accessed through web-based interactive data dashboards.

14/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2021. ; https://doi.org/10.1101/2021.06.23.21259395 doi: medRxiv preprint Figure 2. Toy example to explain the approach for projecting data between layers using Spatial-based overlays. Panel a shows an example of cases projection from layer A to layer B, using the Spatial-based overlays between both layers. Panel b shows an example of trips projection between the same layers.

15/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

16/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2021. ; https://doi.org/10.1101/2021.06.23.21259395 doi: medRxiv preprint For simplicity, the Canarias islands are represented in the bottom left box. Panel (c) represents the MITMA mobility layer and the coloured polygons correspond to individual mobility zones that match district in high-density populated areas and municipalities or groups of municipalities in less populated areas. Panel (d) represents the layers for which some autonomous communities report COVID-19 cases with higher spatial resolution than province level. From top to bottom and left to right the layers: Madrid's BHAs, Cataluña's BHAs, Valencia's Municipalities, Cantabria's municipalities, Castilla y León BHAs, Navarra's BHAs, País Vasco BHAs, Asturias' municipalities. In all the plots colours are only used for visualisation purposes.

17/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2021. ; https://doi.org/10.1101/2021.06.23.21259395 doi: medRxiv preprint Figure 5. Comparison of data projection approaches between geographical layers. The figure represents the census population reported by municipalities with respect to those values estimated from MITMA mobility data after its projection from the MITMA mobility into the municipalities layer. Panel (a) and (b) show the result of the projection using based Spatial and Population-based overlays, respectively.

18/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

19/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

20/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

21/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

22/24
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2021. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2021. ; https://doi.org/10.1101/2021.06.23.21259395 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2021. ; https://doi.org/10.1101/2021.06.23.21259395 doi: medRxiv preprint