Skip to content

COVID-19-Confirmed, Death and Recovered Case Predictions for US (As a part of Assignments in Data And Knowledge Management Course at University of Waterloo)

Notifications You must be signed in to change notification settings

snigdhakakkar/COVID-19-Confirmed-Death-and-Recovered-Case-Predictions-for-US

Repository files navigation

COVID-19-Confirmed-Death-and-Recovered-Case-Predictions-for-US

COVID-19-Confirmed, Death and Recovered Case Predictions for US (As a part of Assignments in Data And Knowledge Management Course at University of Waterloo)

Steps Implemented -

  1. Technologies: Python, Keras, Scikit-learn, Pandas, Numpy

  2. Data preprocessing steps__:

    • Checking for Missing values in Columns.
    • Checking for duplicate records and dropping it if any.
    • Removing features that are highly dependent upon each other. In Covid Dataset we have [State ID], so we do not need [State, Long, Lat] and dropping these features.
    • Type Casting the [Resident Population 2020 Census] and [Population Density 2020 Census] data into float data type.
    • Adding the relative difference of specific quanititative attributes with respect to the state.
    • Checking for outliers - Data point that differs significantly from other observations.By plotting Histograms, we look at data distribution for a variable and find values that fall outside the distribution.
    • Performing Z Score Regularization - score helps to understand if a data value is greater or smaller than mean and how far away it is from the mean. If the Z score of a data point is more than 3, it indicates that the data point is quite different from the other data points. Such a data point can be an outlier.
    • Outliers Removal: - Removing those rows that have [Incident_Rate] greater than 2.5 Z-Score value or lesser than -2.5 Z-Score value Removing those rows that have [Case_Fatality_Ratio] greater than 3 Z-Score value or lesser than -3 Z-Score value.
    • Applying PCA on the covid features and creating a hybrid dataset with a combination of original features along with first five PCA components.
  3. Used hyperparameter selection

  4. Segregated into 2 parts -

Part 1:
Applied Machine learning algorithms (Decision Tree, Naive Bayes, Random Forest, XgBoost and GradientBoost) and compared their performance
Part 2:
Applied Deep learning techniques (Deep Neural Networks Model, Custom - LSTM )

About

COVID-19-Confirmed, Death and Recovered Case Predictions for US (As a part of Assignments in Data And Knowledge Management Course at University of Waterloo)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published