This project classifies dry beans using 5 machine learning techniques
Maintaining the purity of crop variety is an important tool in the global agricultural sector, in order to increase the germination rate, yield of crops during harvesting, and improve the overall quality of the seeds. This is typically done by manually sorting the seeds into various classes and this can be a difficult and labour-intensive task. The aim of this report is to use machine learning techniques to accurately classify commercial dry beans into seven categories; Cranberry Beans (Barbunya), Bombay Kidney beans, Cali Beans, Dermason Kidney Beans, Horoz Beans, Seker Beans and Sira Beans. The dataset used for this paper consists of 16 features that were pre-obtained by extracting the shape forms and dimensions from the images of 13611 sample seeds acquired via a computer vision system. The experiment will involve feature visualization and extraction using Principal Component Analysis (PCA), class balancing using Synthetic Minority Oversampling Technique (SMOTE) and the application of five machine learning techniques; Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbours (KNN), Random Forest Trees and the XGBoost, to create models that will accurately classify the dataset. Each individual machine learning method will be investigated and the performance of the models will be compared. XGBoost had the best performance with an accuracy of 95.7%.