Tharsis Souza, Ph.D. and Gustavo Sainatto, M.D.
Abstract
The sudden and rapid growth of COVID-19 cases is overwhelming health systems globally. Fast, accurate and early detection of SARS-CoV-2 is of vital importance to control the spread of the virus. However, traditional SARS-CoV-2 detection based on RT-PCR assays can be costly, long-drawn-out and widely unavailable making testing every case an impractical effort. In this work, we propose a machine learning-based approach for the rapid detection of COVID-19 cases using commonly available laboratory test data. We analyze a sample of 5644 patients of which 558 tested positive for SARS-CoV-2 from the Hospital Israelita Albert Einstein, at São Paulo, Brazil. The proposed model presents an overall high performance of 92% (AUC) considering a held-out test group of ⅓ of the original sample data. We observe that patients admitted with COVID-19 symptoms who tested negative for Rhinovirus Enterovirus, Influenza B and Inf.A.H1N1.2009 and presented low levels of Leukocytes and Platelets were more likely to test positive for SARS-CoV-2. We also present a parameterized model based on different scenarios which are given as a function of a target Sensitivity level or the total number of potential positive cases the hospital would have the capacity to prioritize. At a 25% capacity rate of patients deemed SARS-CoV-2 positive, the proposed model shows Sensitivity and Specificity of over 84% and 96%, respectively, hence proving to be a useful rapid prioritization tool.
Original Dataset: https://www.kaggle.com/einsteindata4u/covid19