This project aims to predict diabetes based on various health parameters. It analyzes and build classification models to identify diabetes risk.
The dataset includes the following columns:
- Pregnancies: Number of times pregnant
- Glucose: Plasma glucose concentration after 2 hours
- BloodPressure: Diastolic blood pressure (mm Hg)
- SkinThickness: Triceps skinfold thickness (mm)
- Insulin: 2-Hour serum insulin (mu U/ml)
- BMI: Body mass index (weight in kg/(height in m)^2)
- DiabetesPedigreeFunction: Diabetes pedigree function
- Age: Age (years)
- Outcome: Class variable (0 or 1, where 1 indicates diabetes)
To run this project, install the necessary packages:
pip3 install pandas numpy seaborn matplotlib scikit-learn xgboost
These packages are used for data analysis and building classification models.
This project provides a practical approach to predicting diabetes using machine learning techniques. The analysis and models help understand the relationships between health parameters and diabetes risk. For more details and the complete analysis, refer to the Diabetes_pred.ipynb
notebook.