The bookworm project.

Jacob Peterson, Lawrie Brunswick, Priyam Gupta, Sue Boyd

Project Type

Tool

Introduction

With millions of books able to be read, it can be daunting to find the perfect book. Book recommendation tools aim to provide a one stop shop for your next read. Our book recommendation tool, "The Bookish Butterfly" employs a multi-modal approach to offer users a personalized approach. Unlike some recommender systems that only rely on ratings or genres, our model integrates multiple search modalities to provide better recommendations. We provide many options to the user depending on what they are looking for and do the heavy lifting to get some books that will be a great next read. There's no advertising influence here!

Questions of Interest

MVP:

What book should I read next?
What other books can I read from the same author?
Which book would be a good read related to my current book?
What books have similar plots to a book I liked?
What are the popular or trending books in a particular genre?

Repository Structure

Find it here

Data Sources

Book Ratings

Book Crossing Dataset Includes:

BX-Book-Ratings.csv
- 1149779 values
- Fields: User ID, ISBN, Book Rating
- A copy of this data is in data_raw/BX-Book_Ratings.csv
BX-Books.csv
- 271379 unique values
- Fields: ISBN, Book-Title, Book-Author, Year-Of-Publication, Publisher, Image-URL-S, Image-URL-M, Image-URL-Lnot
- Due to file size, this file was not included in the repo, but can be obtained from the link above.

Plot Summaries

Kaggle CMU Book Summary

BookSummaries.txt
- 16,559 values
- Fields: Wikipedia article ID, Freebase ID, Book Title, Author, Publication Date, Book Genres, Plot Summary
- The data from BookSummaries.txt was extracted into the file data_raw/complete_data.csv

ISBN Matching

Google Books API

Google Books API
- ISBN (13 digit)
- Book Title
- This API was used to augment CMU data with ISBN Numbers to help for matching with Book Ratings dataset
- ISBN numbers obtained via Google APIs also included in data_raw/complete_data.csv

Cleaning and Processing

A description of data cleaning, joining and preprocessing can be found Here and Here. A descripton of the final datasets used in production and testing can be found Here.

Data Limitations

This project is a proof of concept, executed on a small dataset (~13K books total after data cleaning), with some data sparsity even within those books. As such, some searches may return no or limited results. We'd love to see the work extended to a larger dataset! When a user tries to search based on a book or an author that is not in our dataset, we let them know and encourage them to search another way.

Local Setup and Environment

Local Setup

This repository can be cloned onto your local computer by running the following command in a terminal:

git clone https://github.com/jacobp24/bookworm_rec.git

If git is not already downloaded, use the Git Guide and then clone the repository.

Environment

For this repository we have set up a environment that can be ran locally and install Python dependencies with appropriate version requirements. Conda needs to be installed before running the next commands. Refer to Conda Installation for further instructions.

Make sure your current directory is set the 'bookworm_rec' folder. If it is not please run this code:

cd bookworm_rec

Now run the next command to create the bookworm_env Conda environment:

conda env create -f env.yml

Make sure to activate the newly created environment:

conda activate bookworm_env

Once done with the environment (after using the tool), deactivate it by running:

conda deactivate

Application

Our application runs with the Streamlit Python library. Before jumping onto the webpage, you will need to do the following steps:

In order to generate the recommendation embeddings we utilized the VoyageAI package.

Please create a local API KEY by following these steps:

Make sure your current directory is set to the 'bookworm' folder within 'bookworm_rec'. If it is not please run this from within the bookworm_rec directory:

cd bookworm

Click Here to create your own API KEY.
Copy your new API key and run this command:

export API_KEY="replace-with-your-api-key"

This command is space specific i.e. there cannot be spaces before and after the equals. Make sure your new API KEY is in double quotes!

To check that the API KEY was created successfully:

echo $API_KEY

Okay now we are ready to run the application!

streamlit run app.py

Go check out our application in your local browser!!!

Examples

Here is a video demonstration of our app!

OR

A walkthrough of application can be found in the examples folder

Name		Name	Last commit message	Last commit date
Latest commit History 354 Commits
.github/workflows		.github/workflows
bookworm		bookworm
data_raw		data_raw
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
env.yml		env.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The bookworm project.

Project Type

Table of Contents

Introduction

Questions of Interest

MVP:

Repository Structure

Data Sources

Book Ratings

Plot Summaries

ISBN Matching

Cleaning and Processing

Data Limitations

Local Setup and Environment

Local Setup

Environment

Application

Examples

About

Releases

Packages

Contributors 4

Languages

License

jacobp24/bookworm_rec

Folders and files

Latest commit

History

Repository files navigation

The bookworm project.

Project Type

Table of Contents

Introduction

Questions of Interest

MVP:

Repository Structure

Data Sources

Book Ratings

Plot Summaries

ISBN Matching

Cleaning and Processing

Data Limitations

Local Setup and Environment

Local Setup

Environment

Application

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages