This dataset was downloaded from Kaggle and can be found by following this link:
Jupyter Notebook containing the complete analysis and classification process.
Dataset file containing the information on bacteria
Ensure you have the following dependencies installed:
- Python 3.x
- pandas
- matplotlib
- seaborn
- scikit-learn
- xgboost
You can install these dependencies using pip:
pip install pandas matplotlib seaborn scikit-learn xgboost
Clone the repository:
mkdir bacteria_classification cd bacteria_classification git clone
Open and run the Jupyter Notebook:
Launch Jupyter Notebook and open 'Bacteria_Classification.ipynb'. Run each cell in the notebook to execute the analysis steps. -
Follow the notebook instructions:
- Load and preprocess the dataset (bacteria_list_200.csv).
- Explore data insights and visualizations, such as pie charts and heatmaps.
- Implement machine learning models including Random Forest, AdaBoost, Gradient Boosting, Logistic Regression, SVM, and XGBoost.
- Evaluate model performance using metrics like accuracy, confusion matrices, and ROC curves.
- Customize and adapt the notebook for further analysis or experimentation.
Adjust parameters, hyperparameters, and model configurations in the notebook based on specific dataset characteristics and analysis goals.
Refer to markdown cells and comments within the notebook for detailed explanations of each step and analysis result.
Extend the analysis with additional visualizations, model optimizations, or new machine learning algorithms as needed.