Machine learning supervised model that predicts the occurrence of breast cancer deployed on a web app for everyone
Breast cancer is cancer that develops from breast tissue. Signs of breast cancer may include a lump in the breast, a change in breast shape, dimpling of the skin, fluid coming from the nipple, a newly inverted nipple, or a red or scaly patch of skin. In those with distant spread of the disease, there may be bone pain, swollen lymph nodes, shortness of breath, or yellow skin.
- Python (Machine Learning)
- HTML, CSS, JavaScript (Frontend)
- Flask (Backend)
to start the server run
- Classifier
- Dataset Exploration
- Results
A classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset.
- A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes).
Creating a bootstrapped dataset (having same size as the original) where samples are randomly selected from original dataset. Creating a decision tree with the bootstrapped dataset with the use of one random feature in each step. Process is repeated resulting in a wide variety of trees (effectively better than a decision tree).
We used Breast Cancer Wisconsin (Diagnostic) Data Set that has 32 feature and 569 record of malignant and begin breast cancer record with the ratio of 2:3 respectively.
Ten real-valued features are computed for each cell nucleus. The mean, standard error and "worst" or largest of these features were computed for each image, resulting in total of 30 features.
- Radius (mean of distances from center to points on the perimeter)
- Texture (standard deviation of gray-scale values)
- Perimeter
- Area
- Smoothness (local variation in radius lengths)
- Compactness (perimeter^2 / area - 1.0)
- Concavity (severity of concave portions of the contour)
- Concave points (number of concave portions of the contour)
- Symmetry
- Fractal dimension (coastline approximation - 1)
In our project we used only 10 of them which was:
- texture mean
- area mean
- smoothness mean
- concavity mean
- area se
- concavity se
- fractal dimension se
- smoothness worst
- concavity worst
- symmetry worst
Then a correlation function is used to find the relation between each feature and the diagnosis. ( if +ve therefore directly proportional, -ve therefore inversely proportional)
The effective features that affect the diagnosis are then considered. having corr=>60% for mean, corr=>60% for SE, corr=>70% for worst.
The Model Showed Precision of 96.8% & Accuracy of 97.1%
Our Sever consists of 3 routes,
- Main Page Route
- Test Page Route
- Prediction API Route
After the model is saved into a pikle file model.pkl
, this file is loaded in the app.py
file to
be ready for prediction. User enter the 10 features and this data is passed to the model to predict
and the give the answer back to be shown in a modal.
To Deploy your app using heroku, you will need Procfile
, requirments.txt
and a web server (eg. gunicorn)
-
Create your account in heroku
-
Set up your files locally
-
Sign in to your heroku account
$ heroku login
-
Create your heroku app
$ heroku create nima-sbme24
5- Add heroku remote to your app
$ git remote -v
heroku https://git.heroku.com/example-app.git (fetch)
heroku https://git.heroku.com/example-app.git (push)
6- Deploy your app
$ git push heroku master