To install run the following from the root project directory (ideally activate a virtualenv first).
# replace cu111 with the specific cuda version your machine supports
pip install torch==1.8.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -q torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cu111.html
pip install -q torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cu111.html
pip install -e ./
Once installed make sure you have the python environment scripts directory in your path.
movielens
movielens dataset from kagglepodcasts
itunes podcaseds dataset from kagglespotify
spotify skip prediction challenge dataset (this is preprocessed data from the dataset)spotify-mini
spotify skip prediction challenge mini dataset (can load directly the downloaded files)
fm-linear
factorization machine with linear embeddingfm-gcn
factorization machine with graph embeddingfm-gcn-att
factorization machine with graph embedding with attention
max_interactions
how many interactions to load from the dataset (-1
for all)negatives_train
how many negative samples to add to the train dataset (-1
for all)negatives_test
how many negative samples to add to the test dataset (-1
for all)batch_size
batch size of the training data loaderepochs
amount of epochs to runembed_dim
dimension of the hidden state of the embeddingembed_dropout
dropout value for the embeddinglearning_rate
learning rate of the optimizerlr_scheduler_factor
lr factor for the plateau lr scheduler (1
by default no scheduler)lr_scheduler_patience
amount of fixed epochs for the plateau lr schedulerlr_scheduler_threshold
threshold for the plateau lr schedulergraph_attention_heads
amount of heads in the GCN with attention modelembed_dropout
dropout factor for the embeddingpairwise_loss
if we should create the training set with pairs of positive-negative interactions (defaultTrue
)train_loader_workers
amount of workers for the train loadertest_loader_workers
amount of workers for the test loaderinteraction_context
context rows to add, separated by comma (defaultall
adds any context)recommend
enable recommend mode
Supported context values are previous
& skip
, and they depend on each dataset.
Additionally you can set interaction_context:random
to test with a random context, this is used to confirm that the factorization machine is correctly implemented and does not improve when adding random context.
Details for the different subcommands are provided later
train
train a model on a datasetfit
fit a dataset using a surprise algorithmtune
tune model hyperparameters using ray tuneexplore-dataset
show information about a datasetrecommend
load a trained model to get recommendations
Once the package is installed and you have the python bin path in you system path, to see the different available actions and parameters run
nnrecommend --help
Passing hyperparameters can be done using --hparam name:value
, you can add the argument multiple times to set multiple hyper parameters, or --hparams-path hparams.json
to load the parameters from a json dictionary.
The format of the hparams.json
file can be a simple dictionary:
{
"embed_dim": 32,
"batch_size": 1024
}
or a dictionary with trials if you want to run multiple trainings one after the other
{
"common": {
"embed_dim": 32
},
"trials": [
{
"batch_size": 1024
},
{
"batch_size": 512
}
]
}
This command allows you to train a model.
nnrecommend train --dataset movielens-lab data/ml-dataset-splitted/movielens
nnrecommend train --dataset movielens data/ml-100k/
nnrecommend train --dataset podcasts data/database.sqlite
nnrecommend train --dataset spotify data/spotify.csv
To select the model:
nnrecommend train --dataset spotify data/spotify.csv --model fm-gcn
To create a tensorboard directory:
nnrecommend train --dataset spotify data/spotify.csv --tensorboard tbdir
Then you can run the tensorboard server on that directory
tensorboard --logdir tbdir
This command allows to fit an algorith with a dataset and get test values.
nnrecommend fit --dataset spotify data/spotify.csv --algoritm knn --algorithm baseline
This command also supports the tensorboard parameter and will create horizontal lines with the test valies for every algorithm.
This command runs hyperparameter tuning with a given dataset and model.
We use ray.tune
for this task.
nnrecommend tune --dataset spotify data/spotify.csv --model fm-linear --config tune_config.json
The command accepts the tune config in a json file with a dictionary with the keys being the hyperparameter names and the values being the ray.tune
methods that describe the possible values.
Check the tune documentation for all the possible values.
{
"learning_rate": ["qloguniform", 1e-4, 1e-1, 5e-4],
"embed_dropout": ["choice", [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]],
"batch_size": ["lograndint", 128, 2048],
"graph_attention_heads": ["randint", 1, 12]
}
When running you can see the progress by starting a tensorboard server on the ~/ray_results
folder.
tensorboard --logdir ~/ray_results
This command shows some graphs about the dataset. It shows for every user, item or context pari:
- histogram of the counts
- spy graph of the adjacency matrix
nnrecommend explore-dataset data/ml-100k --type movielens
This command shows recommendations for a given label.
If you store the trained model it can show recommendations for existing users.
nnrecommend --hparam interaction_context: train data/movielens-100k --dataset movielens --output movielens.pth
nnrecommend recommend movielens.pth --label 300 --user-items 3
This will print information about the user 300
and then will find items to recommend them.
If you train with the recommend
hyperparameter enabled, the dataset will be modified so that the model trains to recommend items to new users by:
- removing the user column
- creating the previous item context
- switching the items and previous item context columns
If you store the trained module by passing the --output
parameter, you can use the recommend
subcommand to get recommendations for new items.
nnrecommend --hparam recommend:1 --hparam interaction_context: train data/movielens-100k --dataset movielens --output movielens_recommend.pth
nnrecommend recommend movielens_recommend.pth --label "star wars"