Modeling and data mining of global data on patients with COVID 19
Iranian Journal of Emergency Medicine,
Vol. 7 No. 1 (2020),
28 March 2020
,
Page e40
https://doi.org/10.22037/ijem.v7i1.31114
Abstract
Introduction: Data mining techniques, including decision tree algorithms can be used for modeling and identifying those at risk of developing COVID-19. The main goal of this study is estimating the risk of death of patients due to COVID-19, using the classification and regression tree (CART) algorithm based on the observed effective factors.
Methods: This paper is an analytical study and the data of all patients with COVID-19 registered on the Kaggle site through Johns Hopkins University, was extracted. There was a total of 26,031 records from various countries. Data analysis was performed using JMP statistical software version 13. In the modeling section, decision tree algorithm and CART model were used.
Results: The results of the classification and regression tree showed that among quantitative variables, age, the interval between hospitalization and result, the interval between onset of symptoms and test result, and the interval between hospitalization and test result, and the qualitative variable of gender were the most important factors affecting the outcome of patient treatment, respectively. According to the analysis of words, fever, cough, sore throat, fatigue, weakness, headache, chills, and runny nose, respectively, were the most common symptoms among patients with this disease.
Conclusion: The accuracy of the fitted model was shown to be 94.1% for experimental data and 91.1% for educational data using the area under the receiver operating characteristic (ROC) curve.
- Decision trees; algorithms; COVID-19; severe acute respiratory syndrome coronavirus 2; data mining; ROC curve
How to Cite
References
Chen Y, Liu Q, Guo D. Emerging coronaviruses: genome structure, replication, and pathogenesis. Medical Virology. 2020.
National Health Commission's briefing on the pneumonia epidemic situation, Pub. L. No. Released on 23 Feb 2020(2020).
Chan JF, Kok KH, Zhu Z, Chu H, To KK, Yuan S, et al. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerging Microbes & Infections. 2020;9(1):221-36.
Gorbalenya AE. Severe acute respiratory syndrome-related coronavirus. The species and its viruses, a statement of the Coronavirus Study Group. BioRxiv. 2020.
Wrapp D, Wang N, Corbett KS, Goldsmith JA, Hsieh CL, Abiona O, et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020 Feb 19.
Farnoosh G, Alishiri G, Farnoosh G, Hosseini R, Dorostkar R, Jalali A. Understanding the 2019-novel Coronavirus (2019-nCoV) and Coronavirus Disease (COVID-19) Based on Available Evidence - A Narrative Military Medicine. 2020;22(1):1-11.
Choi SB, Kim WJ, Yoo TK, Park JS. Screening for Prediabetes Using Machine Learning Models. Computational and Mathematical Methods in Medicine 2014;8.
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. CRC Press, New York. 1984.
- Abstract Viewed: 3131 times
- pdf (فارسی) Downloaded: 997 times