The goal of this project is to make a prediction model which will predict genre of the book.
The dataset which is used in this project, is CMU Book Summary Dataset. Here is the link of the dataset : http://www.cs.cmu.edu/~dbamman/booksummaries.html. I have uploaded the
same in Dataset
folder too, you can access that!
- Pandas
- Numpy
- Sklearn
- Matplotlib
- nltk
- Importing all the required libraries. Check
requirements.txt
. - Upload the dataset and the Jupyter Notebook file.
- Loading dataset and cleaning summary data.
- Encoding Genre column to convert genre into numbers .
- Split dataframe into train and test data.
- Encoding all textual data.
- Apply different classification models and calculate their accuracy.
- Choose best model for our prediction which has highest accuracy.
- Make the final prediction using best model.
- Compare actual genre of the book with predicted genre.
We have deployed 5 machine learning algorithms and every algorithm is deployed successfully without any hesitation. We have checked the efficiency of the models based on the accuracy of each of the models. Now let's take a look at model with their accuracy.
Name of the Model | Accuracy |
---|---|
Support Vector Classifier | 0.775 |
Decision Tree Classifier | 0.43 |
Random Forest Classifier | 0.6433 |
Gradient Boosting Classifier | 0.6433 |
LightGBM Classifier | 0.655 |
Comparing all those scores scored by the machine learning algorithms, it is clear that Random Forest Classifier Model gives highest accuracy and will provide best prediction.
Code Contributed by Akash Jain (@Akash20x)