Skip to content

Latest commit

 

History

History
210 lines (151 loc) · 7.68 KB

USAGE.md

File metadata and controls

210 lines (151 loc) · 7.68 KB

nnrecommend

Installation

To install run the following from the root project directory (ideally activate a virtualenv first).

# replace cu111 with the specific cuda version your machine supports
pip install torch==1.8.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -q torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cu111.html
pip install -q torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cu111.html
pip install -e ./

Once installed make sure you have the python environment scripts directory in your path.

Datasets

  • movielens movielens dataset from kaggle
  • podcasts itunes podcaseds dataset from kaggle
  • spotify spotify skip prediction challenge dataset (this is preprocessed data from the dataset)
  • spotify-mini spotify skip prediction challenge mini dataset (can load directly the downloaded files)

Models

  • fm-linear factorization machine with linear embedding
  • fm-gcn factorization machine with graph embedding
  • fm-gcn-att factorization machine with graph embedding with attention

Hyper Parameters

  • max_interactions how many interactions to load from the dataset (-1 for all)
  • negatives_train how many negative samples to add to the train dataset (-1 for all)
  • negatives_test how many negative samples to add to the test dataset (-1 for all)
  • batch_size batch size of the training data loader
  • epochs amount of epochs to run
  • embed_dim dimension of the hidden state of the embedding
  • embed_dropout dropout value for the embedding
  • learning_rate learning rate of the optimizer
  • lr_scheduler_factor lr factor for the plateau lr scheduler (1 by default no scheduler)
  • lr_scheduler_patience amount of fixed epochs for the plateau lr scheduler
  • lr_scheduler_threshold threshold for the plateau lr scheduler
  • graph_attention_heads amount of heads in the GCN with attention model
  • embed_dropout dropout factor for the embedding
  • pairwise_loss if we should create the training set with pairs of positive-negative interactions (default True)
  • train_loader_workers amount of workers for the train loader
  • test_loader_workers amount of workers for the test loader
  • interaction_context context rows to add, separated by comma (default all adds any context)
  • recommend enable recommend mode

Supported context values are previous & skip, and they depend on each dataset. Additionally you can set interaction_context:random to test with a random context, this is used to confirm that the factorization machine is correctly implemented and does not improve when adding random context.

Subcommands

Details for the different subcommands are provided later

  • train train a model on a dataset
  • fit fit a dataset using a surprise algorithm
  • tune tune model hyperparameters using ray tune
  • explore-dataset show information about a dataset
  • recommend load a trained model to get recommendations

Command Line Interface

Once the package is installed and you have the python bin path in you system path, to see the different available actions and parameters run

nnrecommend --help

Hyperparameters

Passing hyperparameters can be done using --hparam name:value, you can add the argument multiple times to set multiple hyper parameters, or --hparams-path hparams.json to load the parameters from a json dictionary.

The format of the hparams.json file can be a simple dictionary:

{
    "embed_dim": 32,
    "batch_size": 1024
}

or a dictionary with trials if you want to run multiple trainings one after the other

{
    "common": {
        "embed_dim": 32
    },
    "trials": [
        {
            "batch_size": 1024
        },
        {
            "batch_size": 512
        }
    ]
}

Training

This command allows you to train a model.

nnrecommend train --dataset movielens-lab data/ml-dataset-splitted/movielens
nnrecommend train --dataset movielens data/ml-100k/
nnrecommend train --dataset podcasts data/database.sqlite
nnrecommend train --dataset spotify data/spotify.csv

To select the model:

nnrecommend train --dataset spotify data/spotify.csv --model fm-gcn

To create a tensorboard directory:

nnrecommend train --dataset spotify data/spotify.csv --tensorboard tbdir

Then you can run the tensorboard server on that directory

tensorboard --logdir tbdir

Fitting

This command allows to fit an algorith with a dataset and get test values.

nnrecommend fit --dataset spotify data/spotify.csv --algoritm knn --algorithm baseline

This command also supports the tensorboard parameter and will create horizontal lines with the test valies for every algorithm.

Tuning

This command runs hyperparameter tuning with a given dataset and model. We use ray.tune for this task.

nnrecommend tune --dataset spotify data/spotify.csv --model fm-linear --config tune_config.json

The command accepts the tune config in a json file with a dictionary with the keys being the hyperparameter names and the values being the ray.tune methods that describe the possible values. Check the tune documentation for all the possible values.

{
    "learning_rate": ["qloguniform", 1e-4, 1e-1, 5e-4],
    "embed_dropout": ["choice", [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]],
    "batch_size": ["lograndint", 128, 2048],
    "graph_attention_heads": ["randint", 1, 12]
}

When running you can see the progress by starting a tensorboard server on the ~/ray_results folder.

tensorboard --logdir ~/ray_results

Explore Dataset

This command shows some graphs about the dataset. It shows for every user, item or context pari:

  • histogram of the counts
  • spy graph of the adjacency matrix
nnrecommend explore-dataset data/ml-100k --type movielens

Recommend

This command shows recommendations for a given label.

If you store the trained model it can show recommendations for existing users.

nnrecommend --hparam interaction_context: train data/movielens-100k --dataset movielens --output movielens.pth
nnrecommend recommend movielens.pth --label 300 --user-items 3

This will print information about the user 300 and then will find items to recommend them.

Recommend Items

If you train with the recommend hyperparameter enabled, the dataset will be modified so that the model trains to recommend items to new users by:

  • removing the user column
  • creating the previous item context
  • switching the items and previous item context columns

If you store the trained module by passing the --output parameter, you can use the recommend subcommand to get recommendations for new items.

nnrecommend --hparam recommend:1 --hparam interaction_context: train data/movielens-100k --dataset movielens --output movielens_recommend.pth
nnrecommend recommend movielens_recommend.pth --label "star wars"