Machine Learning Final Project Early cancer diagnosis and treatment can drastically alter the outcome of the disease. CancerSeek used DNA and protein analysis on over 1800 blood samples from healthy individuals and cancer patients and created a logistic regression to predict whether a sample was cancerous using a scored number from DNA analysis and 8 protein levels.
Our goal is to improve the prediction algorithms by building our own models. We will use an optimized subset of data from the dataset (additional protein levels, individual characteristics) and compare various classification methods (logistic regression, neural networks, random forests). 'Final Presentation.pdf' and 'Final Submission.pdf' explain our methods and results further.
To run our code:
Make sure to run conda install -c conda-forge python-graphviz
and pip install -r requirements.txt
to install relevant packagaes
Put 'python ./filename' to run any individual file