Xccelerate Data Science Bootcamp Collaborative Project: 4 flavours of recommendation systems using the Booking Crossing Dataset which is also included here in this repo.
See the project's details here
- Clone this repo:
$ git clone https://github.com/ohjho/recommendation_system.git
$ cd recommendation_system
- install the requirements. We highly recommend doing this inside a virtualenv and avoid dependency hell.
#---------------- optional ------------------
$ mkvirtualenv --python=`which python3` NameOfYourEnv
$ workon NameOfYourEnv
#--------------------------------------------
(NameOfYourEnv) $ pip install -r requirements.txt
and just check and resolve any packages dependency issues if they show up under pip check
. It should say No broken requirements found.
- Start Jupyter notebook
$ jupyter notebook
The script data_cleaning.py will import the datasets and clean the data.
To get 3 separate dataframes, do this
from data_cleaning import get_clean_data
df_books, df_users, df_ratings = get_clean_data()
And if the csv files are not under data/
, use the path argument.
To get one merged dataframe, do this:
from data_cleaning import get_merged_data_frame
df_merged = get_merged_data_frame(user_argv=user_threshold, isbn_argv=book_threshold)
where user_threshold is the threshold to filter out users with fewer than this number of books rated. books_threshold is the books counterpart And if the csv files are not under "/data/", use the path argument.