This project aimed to predict the progression of Parkinson's disease in patients using protein and peptide abundance data from cerebrospinal fluid samples. The data was sourced from the AMP PD Progression Prediction Kaggle competition. The project proposed a CNN-LSTM model architecture that leverages both convolutional and recurrent layers to extract features and model temporal patterns in the protein data over time. This enables the model to take into account a patient's historical data when predicting future disease severity scores. The model was compared to a baseline MLP model enhanced with preprocessing techniques like PCA and KNN imputation. Evaluation metrics like MSE, MAE and SMAPE were used. The CNN-LSTM model showed promising results, demonstrating the potential of using deep learning and time series protein data to forecast Parkinson's disease progression. This can ultimately assist in developing more personalized treatments. The project involved data exploration, preprocessing, model implementation in Keras, and rigorous evaluation.
https://www.kaggle.com/competitions/amp-parkinsons-disease-progression-prediction/data