This dataset was downloaded from Kaggle and can be found by following this link: https://www.kaggle.com/datasets/kanchana1990/bacteria-dataset
Jupyter Notebook containing the complete analysis and classification process.
Dataset file containing the information on bacteria
Ensure you have the following dependencies installed:
- Python 3.x
- pandas
- matplotlib
- seaborn
- scikit-learn
- xgboost
You can install these dependencies using pip:
pip install pandas matplotlib seaborn scikit-learn xgboost
-
Clone the repository:
mkdir bacteria_classification cd bacteria_classification git clone https://github.com/Venkatesh-99/Harmful-Bacteria-Classification.git
-
Open and run the Jupyter Notebook:
Launch Jupyter Notebook and open 'Bacteria_Classification.ipynb'. Run each cell in the notebook to execute the analysis steps. -
Follow the notebook instructions:
- Load and preprocess the dataset (bacteria_list_200.csv).
- Explore data insights and visualizations, such as pie charts and heatmaps.
- Implement machine learning models including Random Forest, AdaBoost, Gradient Boosting, Logistic Regression, SVM, and XGBoost.
- Evaluate model performance using metrics like accuracy, confusion matrices, and ROC curves.
- Customize and adapt the notebook for further analysis or experimentation.
-
Adjust parameters, hyperparameters, and model configurations in the notebook based on specific dataset characteristics and analysis goals.
-
Refer to markdown cells and comments within the notebook for detailed explanations of each step and analysis result.
-
Extend the analysis with additional visualizations, model optimizations, or new machine learning algorithms as needed.