Symptom-Based Predictive Model of COVID-19 Disease in Children

Background: Testing for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection is neither always accessible nor easy to perform in children. We aimed to propose a machine learning model to assess the need for a SARS-CoV-2 test in children (<16 years old), depending on their clinical symptoms. Methods: Epidemiological and clinical data were obtained from the REDCap® registry. Overall, 4434 SARS-CoV-2 tests were performed in symptomatic children between 1 November 2020 and 31 March 2021, 784 were positive (17.68%). We pre-processed the data to be suitable for a machine learning (ML) algorithm, balancing the positive-negative rate and preparing subsets of data by age. We trained several models and chose those with the best performance for each subset. Results: The use of ML demonstrated an AUROC of 0.65 to predict a COVID-19 diagnosis in children. The absence of high-grade fever was the major predictor of COVID-19 in younger children, whereas loss of taste or smell was the most determinant symptom in older children. Conclusions: Although the accuracy of the models was lower than expected, they can be used to provide a diagnosis when epidemiological data on the risk of exposure to COVID-19 is unknown.


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic continues to be a priority health problem worldwide. More than eighteen months after the

Case Definition
A confirmed COVID-19 case was defined as any individual testing SARS-CoV-2 positive by real-time RT-PCR or by RDT in a respiratory specimen.

Recruitment Process
To avoid selection bias in case recruitment, paediatricians recorded all the suspected cases seen in their daily practice. However, during work overload peaks, they only collected data from the first 5 suspected cases per day. Follow-up was performed by the patient's paediatrician during a primary care visit or by a telephone interview with the parents or legal guardians, using the planned questionnaire. All data were recorded in a web-based platform, the Research Electronic Data Capture (REDCap ® ) database.

Ethical Considerations
Ethical approval was obtained from the referral IDIAP J. Gol Research Foundation for Primary Care in Catalonia, Spain , and the coordinating centre of the study, Vall d'Hebron Research Institute, Barcelona, Spain (PR(AG)475/2020) on 25 September 2020.

Data Description
The epidemiological and clinical data of the registered cases are described in Table 1. The main objective of this study was to determine which symptoms are decisive within Viruses 2022, 14, 63 4 of 18 the paediatric-age population to define whether a PCR test should be performed when a child presents symptoms suggestive of COVID-19. Therefore, only symptom-related variables were analysed in this study. Before pre-processing the data to train the predictive models, a statistical analysis was performed to determine the weight of each of the clinical features in the dataset for the final diagnosis, to avoid using uninformative variables. As the distribution of the data was unknown and the variables were independent, we performed a chi-square (χ 2 ) test, considering statistically significant any p-value < 0.05. For the analysis to be relevant, we only consider patients that underwent a COVID-19 test and have a reported positive or negative result from it, leading to a cohort of 4419 individuals for the statistical analysis. The co-viral and bacterial infections were diagnosed upon specific testing under clinical suspicion. Bacterial infection was defined when culture growing (haemoculture or culture of the specimen) or bacterial PCR tested positive. The comorbidities that have been included are congenital cardiopathy, hypertension, asthma, chronic pulmonary disease, renal, liver or neurologic disease; diabetes mellitus, tuberculosis, primary and secondary immunodeficiencies (including human immunodeficiency virus (HIV) infection), oncohaematological disease, Kawasaki syndrome, auto-inflammatory diseases, obesity, prematurity and palivizumab administration due to prematurity. In addition, the social data have been obtained due to the effort of the researchers in fulfilling the planned questionnaire in the different paediatric practices.

Pre-Processing
To obtain homogeneous labelling of the data, a thorough pre-processing was conducted. Initially, the diagnosis was coded with three possible values: COVID-19 positive, COVID-19 negative or suspected virus infection. However, for this last option, we checked whether PCR or RDT for SARS-CoV-2 was performed, and based on the result of these tests, the child was classified as positive or negative for COVID-19. If a child received both tests, priority was given to the result of the PCR over the RDT.
Most of the clinical variables were coded as binary using "1" or "0" values, depending on whether the symptom was present or not, respectively. In addition, fever was coded as slight, moderate or high, with "0" meaning no fever, "1" for 37.5 to <38 • C, "2" for 38 to 39 • C, and "3" for >39 • C. The total days of fever were defined as "0" for no fever, "1" if the patient had fever 1 or 2 days, "2" for fever lasting 3 to 7 days and "3" for the presence of fever more than 7 days. Lastly, auscultation was encoded such that "0" means no pathologic auscultation, "1" stands for wheezing, "2" for crackles and "3" for both.
Some data were missing in certain variables, mostly due to the age of the patients. In fact, some of the symptoms are a child's self-reported characteristics, such as headache or loss of smell and taste. In younger children aged from 0 to 5 years, we were unable to obtain certain clinical information due to the lack of communicative skills, intrinsic to that age.
Symptoms with a percentage of missing data in excess of 25% were eliminated from the models, as explained in the Supplementary Information as they did not provide clear information about SARS-CoV-2 infection in children.
Besides, as a different pattern of symptoms was expected to be found according to the age of the child, the general training set was divided into two subsets to train age-specific predictive models for SARS-CoV-2 infection. Thus, three models were developed, one for the general paediatric population (0 to <16 years), and two for children aged 0 to 5 years and 6 to <16 years, respectively. Using age-related data subsets, we were able to rule out Viruses 2022, 14, 63 6 of 18 certain variables with a high number of missing values, but still keep enough data for the models to be trained in a balanced way (50% positives and 50% negatives) to induce generalisation capabilities. Hence, 1540 patients were considered for the general model,  448 for the model for children ages 0-5, and 1026 for the model for children ages 6-15.  Table 2, Tables S1 and S2 (Supplementary Information) show the final symptoms that were chosen to train the general paediatric model, for children under 5 and children aged 6-15 years, respectively. Besides, it is shown how many patients presented each symptom, and how many of those tested positive for COVID-19. For example, the subset of data to train the 0 to 5-year-old children model, in the Supplementary Information, does not use the absence of taste/smell symptoms, since this information is not straightforward to obtain from the younger children.

Methodology Implementation
We developed a systematic pipeline to obtain each predictive model as we implemented several machine learning (ML) and deep learning (DL) architectures to determine which outperforms the others for each data subset (see Supplementary Information).
We followed a pipeline ( Figure 1) that is broken down into data processing, model selection, fine-tuning and evaluation steps. Given a particular model architecture, relevant data and hyperparameter configurations to test (Tables S3 and S4), the pipeline outputs the best hyperparameter configuration, as well as some evaluation metrics. Further description of the pipeline can be found in the Supplementary Information.

Methodology Implementation
We developed a systematic pipeline to obtain each predictive model as we implemented several machine learning (ML) and deep learning (DL) architectures to determine which outperforms the others for each data subset (see Supplementary Information).
We followed a pipeline ( Figure 1) that is broken down into data processing, model selection, fine-tuning and evaluation steps. Given a particular model architecture, relevant data and hyperparameter configurations to test (Tables S3 and S4), the pipeline outputs the best hyperparameter configuration, as well as some evaluation metrics. Further description of the pipeline can be found in the Supplementary Information.

Model Development
We divided each data subset into training and testing sets in 70/30 proportions. The models were cross validated (CV) to evaluate their performance. We trained the candidate model architectures with several hyperparameter configurations and tested their performance against the cross-validation sets. Fine-tuning was then used to optimise the chosen set of hyperparameters. Average validation scores for each of the chosen configurations and an overview of the tuned hyperparameters and the number of configurations tested

Model Development
We divided each data subset into training and testing sets in 70/30 proportions. The models were cross validated (CV) to evaluate their performance. We trained the candidate model architectures with several hyperparameter configurations and tested their  Tables S4 and S5 in the Supplementary Information. The performance was quantified in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, precision and F1 score. These terms are defined in the Supplementary Information.

Feature Importance Extraction
We computed SHAP (SHapley Additive exPlanations) [15] values for all test instances to understand the global importance of each of the variables on the final classification done by the implemented model. SHAP is a model-agnostic explainer method in which, for a particular instance of a dataset, and given the model's classification prediction for that instance, each feature is assigned a weight in terms of how much it impacted the model's output with respect to the expected output.
While we were only able to interpret the model trained with data from all age groups and then analyse the SHAP values of observations by different age strata instead of creating a classifier for each age interval, we believe the latter yields a better interpretation. That is, in the first case, the SHAP values of observations in one age stratum may possibly be influenced by observations of other age groups, with the results being biased depending on what the classifier actually learned. On the other hand, having separate models ensures that the feature importance of each age stratum is not influenced by leading factors in other age groups. We are thus able to see how the predictors influence the output over different age ranges. We used the implementation from [15].

Data Description
A total of 4456 children were recruited in our study, 44.5% (1984/4456) were female and 42% (1872/4456) were younger than 6 years of age, with no significant differences between COVID-19 and non-COVID-19 cases (Table 1). Diagnostic tests for SARS-CoV-2 (PCR and/or RDT) were administered in 4434 (99.5%) of the total recruited cases (Table 1). Of the 840 cases with a PCR test and 3916 with RDT, 321 (38.2%) and 463 (11.8%) tested positive, respectively. Both diagnostic tests (PCR and RDT) were performed at the same time in 354 children: 14/354 tested positive for both (4%); 108 (30.5%) yielded discordant results, 89 (25.1%) with positive PCR and negative RDT, and 19 (5.4%) with negative PCR and positive RDT; and the remaining cases were negative for both tests (232/354, 65.5%). Among children testing negative for SARS-CoV-2 either with RDT or PCR, we found one case of influenza A and 19 cases of adenovirus with RDT and 10 cases after performing a multiplex PCR: rhinovirus (n = 3), adenovirus (n = 3), another coronavirus (n = 1), enterovirus (n = 1), virus Epstein-Barr (n = 1), and bocavirus (n = 1). No cases of respiratory syncytial virus or influenza B were detected. Co-viral infections with SARS-CoV-2 were two rhinoviruses and one case of enterovirus. The use of the school bus, athletic activities, and suspected or confirmed COVID-19 cases at home or school was associated with COVID-19 diagnosis (Table 1). No significant differences were observed between COVID-19 and non-COVID-19 cases for comorbidities.
The results of the χ 2 -test are shown in Figure S1. The degree of fever was relevant to determine whether the patient had COVID-19 or not ( Figure S1). A higher fever was associated with a SARS-CoV-2 negative result, while a lower fever may be related to COVID-19. Besides, this descriptive analysis showed that the lack of sense of taste and smell was associated with a non-COVID-19 diagnosis, in contrast to what is shown in adults. Despite that, we should note that the number of patients reporting a lack of smell and taste is low compared to the whole sample, which is a strong bias to take into account when understanding this analysis. It is also important to note that some symptoms, such as confusion, correlated strongly with the COVID-19 diagnosis due to the high number of missing values or/and to a low number of affected patients, meaning they had to be discarded from any present and future analysis.

Model Development
In this section, we report the scores of the best performing classifiers for each data subset. Table 3 contains the average CV scores of the best configuration found for each architecture tested, while Table 4 contains the test scores for the final fine-tuned architectures with 95% confidence intervals. Table 3 shows the tied results between random forest (RF) and kernel support vector machine (kSVM). We chose RF as the best architecture because we were able to compute its exact SHAP values efficiently, instead of approximating them.
We assessed discrimination by quantifying the AUROC. All the architectures performed quite similarly to each other in each subset, with Boosted Trees (XGB) being the best for the subset including all ages (AUROC = 0.65), and RF the best for subsets aged 0 to 5 and 6 to <16 years, with an AUROC of 0.63 and 0.67, respectively (Table 4). Classifiers performed the worst in the subset for ages 0 to 5 years, while the scores in the subset for all ages and 6 to <16 years were very similar. This could mean that either there were not enough observations in the 0-5-year subset for the classifiers to properly learn patterns, or there were no relevant symptom patterns to be learned. This latter hypothesis is supported by the fact that performance in the subset for all ages was worse overall than in the subset for ages 6 to <16 years, the latter being a subset of the former, which means that adding the 0-5-year subset to the 6 to <16 years subset only confuses the classifiers. This is in agreement with the common low specificity of clinical characteristics observed in younger children. Table 3. Average CV scores for the models trained with the data subset including all ages (architecture 1), with the data subset for ages 0 to 5 years (architecture 2) and with the data subset for ages 6 to 14 years (architecture 3).

Feature Importance Extraction
The figures with multiple colours in this section are beeswarm plots. They are ordered by decreasing overall importance from the top to the bottom of the figure. For each feature, the SHAP value of each test observation is shown as a point. The observation has the symptom defined by the feature if the colour is red, and does not have it if it is blue. The more to the left the points are, the more the output is associated with an absence of SARS-CoV-2 infection. The more to the right the points are, the more the output is associated with a SARS-CoV-2 infection.

General Model
In the model trained for the dataset containing patients aged 0 to 15 (the whole set) (Figure 2), the presence of headache and fatigue positively influenced the likelihood of infection, while the presence of odynophagia, vomiting or diarrhoea negatively influenced it. Moreover, developing fever for 1-2 days, or fever higher than 39 • C, or wheezing and nasal congestion were also associated with a lower probability of SARS-CoV-2 infection. On the contrary, mild fever (38 to 39 • C) or loss of smell or taste increased the likelihood of COVID-19.

Feature Importance Extraction
The figures with multiple colours in this section are beeswarm plots. They are ordered by decreasing overall importance from the top to the bottom of the figure. For each feature, the SHAP value of each test observation is shown as a point. The observation has the symptom defined by the feature if the colour is red, and does not have it if it is blue. The more to the left the points are, the more the output is associated with an absence of SARS-CoV-2 infection. The more to the right the points are, the more the output is associated with a SARS-CoV-2 infection.

General Model
In the model trained for the dataset containing patients aged 0 to 15 (the whole set) (Figure 2), the presence of headache and fatigue positively influenced the likelihood of infection, while the presence of odynophagia, vomiting or diarrhoea negatively influenced it. Moreover, developing fever for 1-2 days, or fever higher than 39 °C, or wheezing and nasal congestion were also associated with a lower probability of SARS-CoV-2 infection. On the contrary, mild fever (38 to 39 °C) or loss of smell or taste increased the likelihood of COVID-19.  Figure 3A,B show the relative importance of each variable. For example, on average, reporting a headache has approximately twice the effect on the model's decision compared to vomiting. This figure does not take into account if features affect the predictions positively or negatively, but it can help the decision-making process by providing a hierarchy of importance. Figure 3B shows the maximum impact that a clinical characteristic had on the prediction of a test observation. This differs from the average impact, and it  Figure 3A,B show the relative importance of each variable. For example, on average, reporting a headache has approximately twice the effect on the model's decision compared to vomiting. This figure does not take into account if features affect the predictions positively or negatively, but it can help the decision-making process by providing a hierarchy of importance. Figure 3B shows the maximum impact that a clinical characteristic had on the prediction of a test observation. This differs from the average impact, and it underscores how a diagnosis process based on a contextual population needs to be individualised for each patient. For example, although the average impact of the loss of smell was +0.09, its maximum impact was +1.47, five times higher than the most impactful feature. Without considering the magnitude of the maximum impact, almost all the top-10 average features (Figures 2 and 3A) were also the top-10 maximum impact features ( Figure 3B), except nasal congestion and fever lasting for 1-2 days.

Model for Children by Age Range
In the model trained for the dataset containing patients aged 0 to 5, having fever from 3 to 7 days, fever higher than 39 °C, odynophagia, visible skin rashes, shortness of breath, wheezing and fatigue all impacted negatively on the predicted likelihood of SARS-CoV-2 infection ( Figure 4A). On the other hand, low fever (37.5 to 38 °C), cough and fever for 1 to 2 days were associated with a higher probability of SARS-CoV-2 infection.
In the model trained with data from patients aged 6 to 15 ( Figure 4B), the presence of painful swelling, vomiting, diarrhoea, wheezing and gastrointestinal symptoms were associated with a lower infection probability. On the other hand, loss of taste and smell, cough, headache and mild fever (from 38 to 39 °C) contributed, on average, to the likelihood of infection.

Model for Children by Age Range
In the model trained for the dataset containing patients aged 0 to 5, having fever from 3 to 7 days, fever higher than 39 • C, odynophagia, visible skin rashes, shortness of breath, wheezing and fatigue all impacted negatively on the predicted likelihood of SARS-CoV-2 infection ( Figure 4A). On the other hand, low fever (37.5 to 38 • C), cough and fever for 1 to 2 days were associated with a higher probability of SARS-CoV-2 infection.
In the model trained with data from patients aged 6 to 15 ( Figure 4B), the presence of painful swelling, vomiting, diarrhoea, wheezing and gastrointestinal symptoms were associated with a lower infection probability. On the other hand, loss of taste and smell, cough, headache and mild fever (from 38 to 39 • C) contributed, on average, to the likelihood of infection.
In general, we noticed how the absence of a symptom did not negatively influence the predicted probability of whether a child is infected with SARS-CoV-2. Instead, the SHAP explanation method captured that the model weighted the presence of a symptom for a positive or negative prediction. infection ( Figure 4A). On the other hand, low fever (37.5 to 38 °C), cough and fever for 1 to 2 days were associated with a higher probability of SARS-CoV-2 infection.
In the model trained with data from patients aged 6 to 15 ( Figure 4B), the presence of painful swelling, vomiting, diarrhoea, wheezing and gastrointestinal symptoms were associated with a lower infection probability. On the other hand, loss of taste and smell, cough, headache and mild fever (from 38 to 39 °C) contributed, on average, to the likelihood of infection.  In general, we noticed how the absence of a symptom did not negatively influence the predicted probability of whether a child is infected with SARS-CoV-2. Instead, the SHAP explanation method captured that the model weighted the presence of a symptom for a positive or negative prediction.
Odynophagia was present as a relevant factor for non-COVID-19 diagnosis in all models. Low and mild fever (37.5-39 °C) was the most relevant sign for a positive diagnosis for children under 6 years of age. A high fever (>39 °C) was likely related to a non-COVID-19 diagnosis in the majority of cases, as well as vomiting and diarrhoea (at least for children aged >6). Headache and fatigue were relevant symptoms for a COVID-19 diagnosis in the overall model (although the former cannot be applied to children aged <6). Overall, mild fever (38-39 °C) was indicative of a COVID-19 diagnosis, and high fever (>39 °C) of a non-COVID-19 case. Vomiting and diarrhoea were associated with a non-COVID-19 diagnosis in the models for all ages and children aged >5 years but was not relevant for younger children.
We noted the value of the feature importance analysis. In the descriptive data analy- Odynophagia was present as a relevant factor for non-COVID-19 diagnosis in all models. Low and mild fever (37.5-39 • C) was the most relevant sign for a positive diagnosis for children under 6 years of age. A high fever (>39 • C) was likely related to a non-COVID-19 diagnosis in the majority of cases, as well as vomiting and diarrhoea (at least for children aged >6). Headache and fatigue were relevant symptoms for a COVID-19 diagnosis in the overall model (although the former cannot be applied to children aged <6). Overall, mild fever (38-39 • C) was indicative of a COVID-19 diagnosis, and high fever (>39 • C) of a non-COVID-19 case. Vomiting and diarrhoea were associated with a non-COVID-19 diagnosis in the models for all ages and children aged >5 years but was not relevant for younger children.
We noted the value of the feature importance analysis. In the descriptive data analysis, only a small fraction of children with COVID-19 had a loss of smell and taste. In fact, the χ 2 -test suggested that these alterations were no signs of a COVID-19 diagnosis. However, for the black-box classifiers developed, these alterations were associated with a COVID-19 diagnosis. Rather than a contradiction, this suggests that the ML models capture complex interactions in the data that are not straightforward in a simple descriptive analysis.

Main Results
The use of ML and DL techniques (artificial intelligence) to model the complex interactions between different symptoms in children with a suspected SARS-CoV-2 infection demonstrates the challenges of combining clinical characteristics to predict a COVID-19 diagnosis in children (AUROC = 0.65), especially in those younger than 6 years of age (AU-ROC = 0.63). Low-grade fever was the major sign to predict COVID-19 in younger children, whereas loss of taste or smell was the most determinant symptom in older children.
The models' accuracy was not very high, which is in agreement with paediatric low specific characteristics of COVID-19 symptomatology in children. Nevertheless, the models were capable of identifying some patterns that are easy to see from a descriptive approach. It is worth noting that the results for older children were more accurate than for younger ones, which was to be expected since older children can better describe their symptoms, which makes the data more reliable. Another factor is the non-specificity in children aged <6 years of other common respiratory viruses that children suffer from in the first years of life.
Therefore, these models could serve as a clinical tool to support the decision to administer a diagnostic test such as PCR for SARS-CoV-2 to confirm a diagnosis in a given epidemiological context, in combination with the paediatrician's criteria.
Before processing the data with predictive models, a conventional statistical analysis showed that the use of the school bus, playing sports, and suspected and confirmed COVID-19 cases at home or school were all associated with a COVID-19 diagnosis (Table 1). Thus, there are epidemiological characteristics linked to the risk of exposure for SARS-CoV-2 that are key when deciding whether to perform a diagnostic test to rule out COVID-19. However, the model can be of use when there is no accurate information about close contact with COVID-19, or this is unknown due to potential multiple risk exposures. Moreover, in a future scenario with lower incidences and more relaxed control protocols for SARS-CoV-2, the model will be especially relevant in symptomatic cases that are not included in contacttracing studies. Therefore, we decided to use ML techniques to explore whether or not a group of symptoms could be a predictor of COVID-19 in children in different age groups.

Comparison with Prior Work
In Israel, a national symptom survey study was carried out among the adult population to build up a prediction model to prioritise individuals for a SARS-CoV-2 test [16]. The result of this study was the development of a tool that could be used worldwide, but mainly in areas with limited testing resources, thereby increasing the rate at which positive individuals could be identified. Moreover, individuals at high risk for a positive test result could be isolated prior to testing [16]. Eighteen self-reported symptoms have been used in a population older than 16 through a mobile application for the early detection of SARS-CoV-2 infection to contain the spread of COVID-19 and efficiently allocate medical resources [17]. Prognostic models to predict the risk of clinical deterioration in acute COVID-19 adult hospitalised cases have been shown to provide a great clinical advantage [18] because they can be easily collected as part of daily routine care. In fact, reliable predictive models can be a means to improve clinical management and, consequently, to better allocate human and economic resources [19]. Other predictive models have been applied with excellent results among laboratory parameters values of patients who died of COVID-19 to determine the risk of mortality [20]. Additionally, systematic evaluation of different prognostic models among hospitalised adults with COVID-19 has been successful showing strong predictors of deterioration and mortality in this population [21], or for long-covid symptoms [22]. However, all of these studies were performed among the adult population. In children and adolescents, fever and cough were the most common symptoms in a systematic review of more than 1300 studies, with the same clinical characteristics observed in most of the other respiratory viral infections [10].
Research involving prognostic factors in paediatric populations with SARS-CoV-2 infection has primarily focused on hospitalised children [23,24], although some studies with a community-based approach have been recently published to assess symptom patterns associated with a positive result for a SARS-CoV-2 swab [25]. These authors found that the symptoms strongly associated with a positive result for a SARS-CoV-2 swab were loss or change of smell or taste, nausea/vomiting, headache and fever [25], whereas other studies identified additional symptoms such as persistent cough, chills, appetite loss, and muscle aches [26], which were also predictive of the Alpha variant SARS-CoV-2 infection. Our main strength is that the cases were recruited from the community by primary healthcare paediatricians, minimising any potential selection bias. As a consequence of this approach, our model is comparable and similar to the above-mentioned community-based studies for the older paediatric age group, but not for the younger children (<6 years old) because of the lack of information about loss of smell or taste is intrinsically associated with their age. In fact, only two of the whole set of symptoms, low fever (37.5 to 38 • C) and cough, were associated with a higher probability of SARS-CoV-2 infection in younger children in our study. Finally, this model could be useful for administrators of schools or day-care centres to consider reassessing the symptoms to include only those that are most strongly associated with positive results for swabs for SARS-CoV-2 infection.

Limitations
Our study has some limitations. Firstly, some missing values limited the training performance of the model learning approach, and greater sample size could have improved the AUROC value. In fact, for data as complex as these, a large number of observations is probably needed for a non-overfitted behaviour and a higher predictive performance. Secondly, the majority of tests administered in the study are RDT, which are, in general, less sensitive than RT-PCR, and this could introduce a bias in the former analysis. However, the RDT used in this study meet the WHO criteria of ≥80% sensitivity and ≥97% specificity, and from our cohort of patients only symptomatic were tested and in their early symptomatic period which increases the diagnostic value, as found in [27]. This would mean that, if there are 20% of patients with a false negative diagnosis, the positive RDT patients would actually increase by 115. That loss in our study is acceptable, as it would not make a significant improvement in the model training. On the other hand, our model was trained for symptoms collected in paediatric cases between November 2020 and March 2021, when B.1.177 (November-February) and Alpha (B.1.1.7) (February-March) were the predominant SARS-CoV-2 variants in Catalonia (Spain), and this analysis could have given different results for new SARS-CoV-2 variants, such as Delta (B.1.617.2). In the UK, a recent study showed that the odds of several symptoms were higher with Delta than Alpha infection, including headache and fever. However, symptoms had a short duration and were similar for both variants, and, fortunately, very few children were admitted to hospital with either variant [28]. Thirdly, we have presented as 'important features' those that are only 'important' in the context of a unique classifier. Thus, if a classifier has learnt wrong representations from the data, or was simply not able to identify complex patterns, the features presented as important by SHAP will not be representative of reality.
Moreover, even if the model performance fits the data, feature importance is to be interpreted as correlated to reality and not causal. With this in mind, the better performing the models are, the better feature importance values can be interpreted. In our case, the model for ages 0 to 5 years had the worst performance, so feature importance values are not as representative of the model as for ages ≥6. In addition, SHAP is a 'permutation-based' explanation method. Therefore, in order to compute feature importance, some amount of randomisation is required, which entails some degree of instability in the results. If one were to recompute SHAP values for a particular observation multiple times, the results may vary slightly. Despite this, we have used SHAP to represent patterns of the whole data population, not individual instances. Moreover, interactions between factors have not been explored locally (i.e., we determined the weight of each symptom on a positive or negative SARS-CoV-2 test result, but not how different combinations of symptoms may affect the result).
Besides, although we have characterised each population's model on average, individual observations can have different features as main contributors to their probability, as can be seen in Figure 3B, which shows the maximum impact of features on any observation. The features with the maximum impact on a model's output are not ranked the same as the features with the maximum average impact.
Finally, this study was carried out in a context in which SARS-CoV-2 was circulating freely throughout the community, which increases the chances of positive results, and in the absence of other seasonal viruses such as respiratory syncytial virus or influenza, which may share some symptoms with COVID-19. Therefore, once the seasonal viruses circulate again, comparative studies could be carried out to explore the utility of this methodology in providing a reliable symptom-based diagnosis.

Conclusions
We were able to characterise the clinical presentation of the COVID-19 disease in children (<16 years old) with an AUROC of 65%, and also to determine the differences between children <6 years old (AUROC of 63%) and children aged 6 to 16 (AUROC of 67%).
The present study offers a useful tool for paediatricians to help decide whether to administer a SARS-CoV-2 test or not. In addition, the model could be put in service for the general public by means of, for example, a web or mobile application, and guide the parents or users to decide whether the child should go for a consultation, and hence prevent the collapse of medical institutions.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/v14010063/s1, Figure S1: Incidence of COVID-19 testing positive by RDT or PCR per significant symptom (p-value < 0.05 in the χ 2 -square test). We divided the symptoms into two different figures (A) and (B) to improve the visualisation. Figure S2: SVM hyperplane separating samples from two different classes, Figure S3: Visual description of how a Random Forest model provides a prediction, Figure S4: General representation of a Confusion Matrix, Figure S5: Visual description of the ROC curve, Table S1: Specifications of the dataset used to train the predictive model of COVID-19 in patients aged 0 to 5 years old, Table S2. Specifications of the dataset used to train the predictive model of COVID-19 in patients aged 6 to 15 years old, Table S3: Grids of hyperparameter values explored in the model selection and fine-tuning steps of the modelling pipeline. Each hyperparameter configuration is created by randomly picking a hyperparameter value from the grid, Table S4: (A) Number of hyperparameter configurations used in the model selection step of the pipeline. Each of the configurations for all the architectures is tried out for all three data subsets. (B) Number of hyperparameter configurations used in the fine-tuning step of the pipeline. Only one architecture is explored for each data subset. For both steps, all hyperparameter configurations are distinct, Table  S5: Computational resources used in the pre-processing and modelling phases and their IDs.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the Hospital Universitari Vall d'Hebron (Barcelona, Catalonia, Spain) (PR(AG)475/2020 on 25th September 2020 to set up the COPEDICAT network and to create the database for the data collection, and PR(AMI)40/2021 on 29 March 2021 for the specific data analysis).
Informed Consent Statement: Patient consent was waived due to the data was collected following the standard of care for the participants without additional invasive or non-invasive diagnostic methods performed for the study. The study complies with the applicable current regulations on data protection; in particular, Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals with regard to the processing of personal data and on the free movement of such data (RGPD), Organic Law 3/2018 of 5 December, on data protection and guarantee of digital rights (LOPDGDD), as well as any other data protection regulations that may be applicable.
Data Availability Statement: Data will be freely available upon request and the link to the replication codes and archived datasets can be found in https://github.com/chus-chus/cov19-modeling (accessed on 16 November 2021).