Summary.txt

Background
On 29 December 2019, a cluster of severe respiratory diseases emerged in Wuhan, China of unknown cause. 
A novel coronavirus was soon isolated from these patients as the causative agent of viral pneumonia (1). 
This clinical entity, designated coronavirus disease 19 (COVID-19) by the World Health Organization in 
February, 2020, has since spread at an alarming rate across the globe (2). The clinical spectrum of COVID-19 
has been reported as broad, ranging from asymptomatic infection, to mild upper respiratory tract illness, 
to severe viral pneumonia with respiratory failure, and death. Risk factors have preliminarily
been outlined for severe disease and in-hospital death among patients who have confirmed disease (2). 
Prediction of a patient’s future in-hospital mortality from the time of initial presentation 
could afford clinicians early valuable prognostic information as well as inform future status of the 
intensive care unit. Machine learning may be leveraged to make predictions from the myriad data features 
collected at the initial patient provider interface with a high degree of accuracy.

Design
Detailed information about COVID-19 patients was first read in from an authoratitive retrospective cohort study (2), which included demographics and physical examination and laboratory findings at hospital admission. A database of 10,000 patients was then generated (simulated) based on the distributions described in the paper, relative risks of mortality (binary features), and ranges of values (continuous features) for patients who were reported as either deceased or recovered. 

Next, a machine learning model (multilayer perceptron) was trained on the database to predict the probability of a deceased outcome (ie. the likeliehood of mortality) based on the given features. 7 weka classifiers were trained and evaluated on the dataset using 5-fold cross-validation. Models were compared using their accuracy on the held-out portion of the data. The best model was selected, serialized to disk and made available for subsequent use.

User input is collected from a simple GUI which inquires of 17 patient features whereon the model trained, such as white blood cell count, age, smoking history, etc. The inputted values are passed to the saved model and a results screen displays the predicted mortality risk for the new patient, along with graphs of trends in the generated database.

Path to Repo:
https://github.com/UPenn-CIT599/final-project-team_37_covid_19

Team members and their contributions

Caleb Busch: Performed file input/output, data analysis and risk calculations, simulation of patient database, graphical display 
of risk factor to outcome trends, and literature review

Tobi Olatunji: Built Machine Learning Pipeline using Weka. Trained several classifiers on patient data, selected best model based on accuracy metric, wrote trained model to disk, converted user input to instance, loaded model at test time, predict and return risk score to GUI

Zhiming Liu: built the entire GUI for the application, designed the form with user input validation, collected and submitted user input, converted these values into an array for easy ingestion by the machine learning classifier.


References
1. Zhu N, Zhang D, Wang W, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med. 
2020;382:727-733.
2. F Zhou, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China:
a retrospective cohort study. Lancet (2020 Mar 11), 10.1016/S0140-6736(20)30566-3