PySpark implementation of SVD++ for Top-N Recommendation

Prerequisites

You need to install Apache Hadoop and Apache Spark on every nodes of the cluster.

tar zxvf hadoop-3.y.z.tgz
ln -s /your/hadoop/path/hadoop-3.x.z /your/hadoop/path/hadoop

tar zxvf spark-2.y.z-bin-hadoop2.7.tgz
ln -s /your/spark/path/spark-2.y.z /your/spark/path/spark

make python

make test

make example

Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. Yehuda Koren, KDD’08
Spark: Cluster Computing with Working Sets
Scaling Collaborative Filtering with PySpark
Running Spark on YARN
NicolasHug/Surprise