Jacob Peterson, Lawrie Brunswick, Priyam Gupta, Sue Boyd
Tool
- Introduction
- Questions of Interest
- Repository Structure
- Data Sources
- Local Setup and Environment
- Examples
With millions of books able to be read, it can be daunting to find the perfect book. Book recommendation tools aim to provide a one stop shop for your next read. Our book recommendation tool, "The Bookish Butterfly" employs a multi-modal approach to offer users a personalized approach. Unlike some recommender systems that only rely on ratings or genres, our model integrates multiple search modalities to provide better recommendations. We provide many options to the user depending on what they are looking for and do the heavy lifting to get some books that will be a great next read. There's no advertising influence here!
- What book should I read next?
- What other books can I read from the same author?
- Which book would be a good read related to my current book?
- What books have similar plots to a book I liked?
- What are the popular or trending books in a particular genre?
Book Crossing Dataset Includes:
- BX-Book-Ratings.csv
- 1149779 values
- Fields: User ID, ISBN, Book Rating
- A copy of this data is in data_raw/BX-Book_Ratings.csv
- BX-Books.csv
- 271379 unique values
- Fields: ISBN, Book-Title, Book-Author, Year-Of-Publication, Publisher, Image-URL-S, Image-URL-M, Image-URL-Lnot
- Due to file size, this file was not included in the repo, but can be obtained from the link above.
- BookSummaries.txt
- 16,559 values
- Fields: Wikipedia article ID, Freebase ID, Book Title, Author, Publication Date, Book Genres, Plot Summary
- The data from BookSummaries.txt was extracted into the file data_raw/complete_data.csv
- Google Books API
- ISBN (13 digit)
- Book Title
- This API was used to augment CMU data with ISBN Numbers to help for matching with Book Ratings dataset
- ISBN numbers obtained via Google APIs also included in data_raw/complete_data.csv
A description of data cleaning, joining and preprocessing can be found Here and Here. A descripton of the final datasets used in production and testing can be found Here.
This project is a proof of concept, executed on a small dataset (~13K books total after data cleaning), with some data sparsity even within those books. As such, some searches may return no or limited results. We'd love to see the work extended to a larger dataset! When a user tries to search based on a book or an author that is not in our dataset, we let them know and encourage them to search another way.
This repository can be cloned onto your local computer by running the following command in a terminal:
git clone https://github.com/jacobp24/bookworm_rec.git
If git is not already downloaded, use the Git Guide and then clone the repository.
For this repository we have set up a environment that can be ran locally and install Python dependencies with appropriate version requirements. Conda needs to be installed before running the next commands. Refer to Conda Installation for further instructions.
Make sure your current directory is set the 'bookworm_rec' folder. If it is not please run this code:
cd bookworm_rec
Now run the next command to create the bookworm_env
Conda environment:
conda env create -f env.yml
Make sure to activate the newly created environment:
conda activate bookworm_env
Once done with the environment (after using the tool), deactivate it by running:
conda deactivate
Our application runs with the Streamlit Python library. Before jumping onto the webpage, you will need to do the following steps:
In order to generate the recommendation embeddings we utilized the VoyageAI package.
Please create a local API KEY by following these steps:
- Make sure your current directory is set to the 'bookworm' folder within 'bookworm_rec'. If it is not please run this from within the
bookworm_rec
directory:
cd bookworm
-
Click Here to create your own API KEY.
-
Copy your new API key and run this command:
export API_KEY="replace-with-your-api-key"
This command is space specific i.e. there cannot be spaces before and after the equals. Make sure your new API KEY is in double quotes!
- To check that the API KEY was created successfully:
echo $API_KEY
- Okay now we are ready to run the application!
streamlit run app.py
Go check out our application in your local browser!!!
Here is a video demonstration of our app!
OR
A walkthrough of application can be found in the examples folder