Machine Learning-Based Clinical Adjusted Selection of Predicting Risk Factors for Shunt Infection in Children

Background : Shunt Infection is a common complication of shunt insertion in children which can lead to bad neuro-developmental conditions and impose a considerable economic burden for the health care system. So, identifying predictive factors of shunt infection could help us in the proper improvement of this deteriorating condition. Methods: In this study, related risk factors of 68 patients with history of shunt infection and 80 matched controls without any history of shunt infection, who were all operated in a single referral hospital were assessed. Three machine learning (ML)-based measures including sparsity, correlation, and redundancy along with specialist’s score were applied to select the most important predictive risk factors for shunt infection. ML was determined by summation of sparsity, correlation and redundancy measures, and the final total score was considered as normalization (ML-based score + specialist score). Results: According to the total score, prematurity, first ventriculoperitoneal shunting (VPS) age, intraventricular hemorrhage (IVH), myelomeningocele (MMC) and low birth weight had higher weights as shunt infection risk factors. icterus, trauma, co-infection and tumor had the lowest weights and history of meningitis and number of shunt revisions were defined as intermediate risk factors. Conclusion: The “ML-based clinical adjusted” method may be used as a complementary tool to help neurosurgeons in better patient selection and more accurate follow-up of children with higher risk of shunt infection.

journals.sbmu.ac.ir/Neuroscience http of operation, surgical technique, etc, 4 but the infinite risk factors and the quantity of their role in predisposing shunt infection is still controversial. So, accurate determination of shunt infection risk factors can improve our current practice in preventing this catastrophic complication.
As an attempt to use machine learning (ML) techniques in medical researches, some medical informatics such as logistic regression and artificial neural networks (ANNs) have been applied to develop models for the prediction task. Habibi and colleagues 5 performed univariate analysis for determining shunt infection risk factors. Five variables including low birth weight, age at the first shunting procedure, shunt revision, history of prematurity, and myelomeningocele (MMC) were significantly associated with higher risk of shunt infection using univariate analysis. They concluded ANN can predict pediatric shunt infection in hydrocephalus children with an accuracy of 83.1%. Sabeti et al 6 applied the state-of-art techniques to select the most informative predicting factors for prediction of shunt infection in hydrocephalic children. The probability (accuracy) of shunt infection was determined with different intelligent and statistical classifiers. Their results indicated that history of prematurity and intraventricular hemorrhage, age of the first shunt procedure, number of shunt revisions, brain tumor induced hydrocephalus, birth weight, and coinfection were the best descriptive features with an accuracy of 68%-81%.
Tunthanathip and colleagues 7 applied ML techniques for predicting risk of infection after neurosurgical operations. Using the backward stepwise method, they determined the significant predictors of surgical site infection in the multivariable Cox regression analysis as postoperative CSF leakage/subgaleal collection and postoperative fever. They concluded the Naive Bayes algorithm was an accurate ML method for predicting surgical site infection after neurosurgical operations.
Luz and colleagues 8 reviewed the methodological aspects of ML techniques in the field of general infection management. Between 52 included studies, 35 different ML techniques were used. Logistic regression was applied in 18 studies followed by random forest, support vector machine, and ANN in 18, 12, and 7 studies, respectively. They concluded that building trust in these new technologies would require further improvement and more explanation of the ability and interpretability of applied models. Muscas and colleagues 9 analyzed different ML models to predict shunt-dependent hydrocephalus after aneurysmal subarachnoid hemorrhage. They suggested a single best distributed random forest model could be very helpful in identifying low-risk patients for shunt-dependency with an accuracy of 90%. Therefore, ML prognostic models could allow accurate predictions with a large number of variables and a more subjectoriented prognosis.
When ML models are used in exploring relationships between different materials (risk factors and prediction labels), proper risk factor selection is the most important issue. The most common risk factor selection algorithms usually require hyper-parameter tuning and do not actively consider the prior knowledge of domain experts. Therefore, integration of weighted scores of domain experts with ML selection process may decrease the probability of clinical crucial risk factors elimination. In this study, we attempt to predict pediatric shunt infection risk factors more accurately by using different ML selection methods and their integration with pediatric neurosurgeon's weighted scores.

Data Description
Hydrocephalus is usually defined as increased volume of CSF in the CNS, as a result of obstruction in CSF pathways or diminish in CSF absorption ( Figure 1). Definite diagnosis of shunt infection is determined by identification of bacterial pathogens in the CSF or shunt hardware culture. In patients with negative CSF culture and clinical evidence of CNS infection, shunt infection could be considered with abnormal CSF analysis parameters (positive smear, low glucose level, high white blood cell count or high lactate level), exposure of shunt device or presence of infected pseudocyst in the abdomen.
In this study, among more than 800 ventriculoperitoneal shunt procedures that have been performed by the senior author (Habibi et al 5 ) in Children's Medical Center hospital of Tehran (Iran) on hydrocephalus patients under the age of 12, 148 patients with hydrocephalus were selected by considering a set of meticulous inclusion/ exclusion criteria. 68 patients with shunt infection were consecutively enrolled, and 80 patients without shunt infection (with the same protocol and inclusion/exclusion criteria) who have had undergone shunting procedure in the same week were considered as controls for each case. The patients were included only if they have undergone  journals.sbmu.ac.ir/Neuroscience http ventriculoperitoneal shunting (VPS) in an elective setting with a standard protocol and had completed a followup period of at least six months. The method and time of surgery, prophylactic antibiotic, operation theater settings, and the number of staff inside the theater were equal in all cases. Those with ventriculoatrial shunting, operation in an emergent setting, first procedure in other centers, deviation from the protocol, incomplete or inaccessible medical data, and incomplete or missing follow-up were excluded from the study. For each patient, demographic and medical information including sex, parents' consanguinity, gestational age at birth, type of delivery, birth weight, prematurity, head circumference at birth, neonatal icterus, history of MMC, history of meningitis, history of intraventricular hemorrhage (IVH), head trauma, brain tumor, age at surgery time, duration of surgery, type of inserted shunt, other-site active infection within 30 days prior to shunt insertion, CSF leak after shunting, and numbers of previous shunt revisions were recorded (Table 1).

Proposed Risk Factor Selection
Risk factor selection is a preprocessing technique that identifies the key risk factors of a given problem. 11,12 It can be used in shunt infection prediction, it cannot only reduce dimensionality but also help us understand the causes of shunt infection. To the best of our knowledge, many specialists often rely on trial-and-error process or personal biased experience in selecting risk factors. Moreover, most ML-based algorithms tend to ignore the prior knowledge of specialists about more relevant risk factors which may result in removing some crucial risk factors. Hence, we analyzed our dataset with risk factor selection method incorporating domain expert knowledge. The proposed approach includes three measures of sparsity, correlation, and redundancy which were finally adjusted with the weighted score of the specialists.
Sparsity Measure Sparsity 13 determines how far a set of numbers is spread out from their average value. For a continuous variable, if its variance is close to zero (it means the variable fluctuates in a small range), so the variable can be removed (it means its correlation with target attribute cannot be precisely evaluated). For a discrete variable, if the fraction of a certain value of the variable exceeds 85% of the total number of samples, the discrete variable can be removed. The sparsity of each risk factor can be defined as x shows the mean value. For selection procedure, where if the correlation between the risk factors and target attribute is lower than predefined threshold (here, the median value), the risk factor will be removed.

Redundancy Measure
Correlation measure 13 tries to retain strong correlation between risk factors and target attribute, but correlation among different risk factors is not analyzed. Here, redundancy measure can be used to evaluate redundancy among risk factors. Redundancy between subset of risk factors can be estimated based on Pearson's correlation coefficient (PCC), as follows: where if redundancy value is greater than the predefined threshold, one of the two risk factors will be removed, otherwise, the two risk factors will be both retained. In this study, the predefined threshold is considered the mean value for redundancy measure.

Specialist Score
Considering the prior knowledge of specialists about more relevant risk factors prevents removing the crucial risk factors. In this study, we incorporated the importance of the risk factors depends on the domain knowledge of the specialists. We quantify that aspect by Score measure that includes the importance score (weight) of the risk factor given by each specialist. The importance score of the risk factor is described as where n is the number of specialists and i w shows the rating weight of each specialist (since all specialist in this study are senior in pediatric neurosurgery, we assigned the same rating weight to all of them). sp=1 indicates that the expert thinks the risk factor is crucial, sp=0.5 indicates that the expert is uncertain about the importance of the risk factor, and sp=0 indicates that the expert considers the risk factor is not important. If weighted specialist score is greater than the threshold (here, the mean value), risk factor will be retained, otherwise, the risk factor is removed. The sp weights of the three senior specialist are shown in Table 2.

Classification
Classification can be used to create models describing the mapping from risk factors to prediction labels with generalization ability for never-seen before inputs. In this part, some widely-used classifiers such as BayesNet, 14 multi-layer perceptron, 15 random forest 14 and Bagging 16 are chosen as candidate classifiers.

Results and Discussion
In the first stage, the state-of-art classifiers are employed for predicting shunt infection. All 11 features are used in this stage, and our results in Table 3 are obtained using  10-fold cross validation. In 10-fold cross-validation, our dataset (148 samples) is randomly partitioned into 10 equal size subsamples. Of the 10 partitions, a single partition is retained as the validation data for testing the classifier, and the remaining 9 partitions are used as training data. The cross-validation process is then repeated 10 times where each of the 10 partitions used exactly once as the validation data, and the final result is considered as average of 10 folds. In this study, all mentioned classifiers were applied in Weka software. 17 Our results in Weka software showed that the prediction accuracy of shunt infection ranged from 70% to 76% (Table 3).
In the second stage, four measures were calculated: sparsity, correlation, redundancy and weighted specialist scores. The last measure was defined as the average of given scores of three pediatric neurosurgeons to each risk factor as their considered role in shunt infection. The value of different measures for each risk factor is shown in Table 4.
In the third stage, predefined threshold is applied to select the most important risk factors for each measure. If a selected risk factor meets the predefined threshold, it remains, otherwise it is removed from the process. Selected risk factors for each measure are shown in Figures  2, 3, 4, and 5, separately. Low birth weight, prematurity, IVH, MMC, and number of revisions are selected by sparsity method. By using correlation measure, the selected risk factors were low birth weight, prematurity, MMC, age at first shunt surgery and number of revisions. Low birth weight, trauma, history of meningitis, age at first shunt surgery and number of revisions were chosen by redundancy measure. And finally, specialist scores highlighted low birth weight, prematurity, history of meningitis, age at first shunt surgery and number of revisions as the most important risk factors for shunt infection.
In the fourth stage, for each risk factor, the differentiation between shunt infection group and the control group was evaluated using Student's t test. The significance level of P<0.05 was considered statistically significant. The normalized ML-based score (Sparsity + Correlation + Redundancy) was calculated, and compared with the specialist score (Table 5). Additionally, the normalized total score (ML-based score + specialist score) was determined (Table 5).
In order to better differentiate between risk factors; we decided to categorize them into three subgroups based on final total score. Features with total score of 0.66 to   journals.sbmu.ac.ir/Neuroscience http 1.00 were considered as high-ranked risk factors and those with a total score of 0.00 to 0.33 were placed in lowranked group. An intermediate-ranked group was defined for features with a total score of 0.33 to 0.66. According to the total score, prematurity, first VPS age, IVH, MMC and low birth weight had higher weights as the shunt infection risk factors. Icterus, trauma, co-infection, and tumor have the lowest weights and history of meningitis, number of shunt revisions were defined as intermediate risk factors. Table 6 shows these ranked risk factors. Now, the clinical correlation of each feature with its weighted score is explained:

High-Ranked Factors
Prematurity: Traditionally, prematurity is defined as an independent risk factor for shunt infection due to immature immune system and undeveloped skin barrier in preterm neonates. [18][19][20] The specialist score of this feature was the highest [1.00] which reflect the clinical importance of this item and interestingly, it has a high ML-based score. Additionally, after using t test between the cohort and control group, a significant statistical difference was observed between the two groups in terms of prematurity (P<0.05). So, according to our method, prematurity was the most important factor contributed with shunt infection and pediatric neurosurgeons are recommended to postpone shunt insertion as much as possible in preterm patients if it is possible or consider alternative interventions such as brain endoscopy. First VPS age: A higher risk for shunt infection is reported in young aged neonates especially in those less than six months of age 18,21,22 because of their underdeveloped immune system. Here, although the first VPS age was not statistically significant in our series which may be related to the selection bias of postponing shunt insertion as much as possible in very young babies if it could be possible in our hospital; the specialist score and ML-based score were highly aligned which results in a high total score (0.90) and make this parameter significant. This result is generally in accordance with literature and common opinion between pediatric neurosurgeons.
Intraventricular hemorrhage: IVH is defined as an independent risk factor for shunt infection because of other independent coexisting factors like prematurity in very young neonates and more revision operations in elder children. 20 Although, this factor was not statistically significant in this series (P>0.05) and the specialist score    gives a medium rank to it (0.5); this feature has the highest ML-based score among other risk factors because of its high sparsity.

Myelomeningocele (MMC):
Neonates with MMC often require shunt insertion because of coexisting hydrocephalus. There is usually risk of CSF leakage from MMC and they usually have shunt placement in first months of age; both predispose them to shunt infection. Having MMC was statistically significant in our study and had a high ML-based (0.91) and total score (0.75) which is aligned with many other classic studies. 5,23 Low birth weight: Low birth weight infants have higher risk of shunt infection which could be secondary to prematurity at least in some cases [24][25][26] but as an independent factor, it was statistically significant in our series (P<0.05). This factor had an almost high total score in our study (0.69); and as a recommendation, insertion of shunt when the neonate gained a higher weight was acceptable if it was possible.

Intermediate-Ranked Risk Factors
Number of shunt revisions: More than 50% of the patients with VP shunt have shunt malfunction and require at least one revision shunt surgery 27 which potentially predispose the shunt hardware and CNS to pathogen microorganism. 26,28,29 This factor was statistically significant in our series as a risk factor for shunt infection. A higher specialist score (0.67) was given to this feature than ML-based score (0.51) which adjusts total score to 0.61 and placed this factor in the intermediate risk group. Meticulous consideration of protective protocols during revision surgeries could result in lower risk of infection. History of meningitis: History of treated CNS infection could be considered as a risk factor for shunt infection but there is no strong evidence in literature for it. 23 History of meningitis was not statistically significant in our series which is in line with so many recent studies. [29][30][31] It had an intermediate total score (0.40) in our study, reflecting a moderate role for this feature in increasing the risk of shunt infection so a more close follow-up is recommended in patients with history of CNS infection.

Low-Ranked Risk Factors
Co-infection: Blood bone infection such as gastrointestinal infections (peritonitis) or dental abscess may increase risk of late shunt infections, however, it is not considered as a prevalent risk factor in general practice. [32][33][34] In our series, this factor was not statistically significant (P>0.05) and had low ML-based score which resulted in a low total score (0.31.
Brain tumor: Any other brain surgery including brain tumor resection could potentially increase risk of CNS infection by exposing CSF to external microorganisms. On the other hand, risk of shunt malfunction and subsequent shunt revision operation may be higher in these patients, but classically, brain tumor hydrocephalus is not considered as a strong risk factor for shunt infection in literature. 23,35,36 Here, this feature is not statistically significant and had low values in ML-based score and total score which is in line with most other clinical data.
Trauma: Hydrocephalus usually occurs in severe brain injuries and almost treated with VP shunt if it is symptomatic. This feature was not statistically significant in our series and had a very low total score (0.08) which is confirmed by previous clinical experience. 23,37,38 Icterus: Icterus was not an important factor for shunt infection and had the lowest total score (0.00) which marked it as the lowest weighted feature in our series.
In this study, we tried to incorporate domain expert knowledge into ML techniques. Integration of weighted scores of human experts with ML selection process may decrease the probability of clinical crucial risk factors elimination. In our previous study, 6 we focused on the well-known techniques for prediction of shunt infection in patients with hydrocephalus without considering specialist knowledge. Ignoring the prior knowledge of specialist results in removing some crucial risk factors such as MMC, history of meningitis, but some less clinical important factors as co-infection and brain tumor are selected.

Conclusion
We used a combination of different ML approaches to determine the rank of each risk factor in shunt infection, then in an attempt to improve our results with current clinical knowledge, we combined ML-based scores with given specialists score which result in final total score. Our results are in line with previous classic medical studies. So, this "machine learning based -clinical adjusted" method may be used as a complementary tool to help neurosurgeons in better patient selection and more accurate follow-up of children with higher risk of shunt infection.

Ethical Statement
Non applicable.