This project implements a Collaborative Deep Learning (CDL) model for movie recommendations, combining deep learning techniques with collaborative filtering to overcome limitations of traditional recommender systems.
Our implementation explores the use of CDL to improve recommendation accuracy, especially in scenarios with sparse data. The model integrates Stacked Denoising Autoencoders (SDAE) with matrix factorization to learn both content-based and collaborative features.
We used the MovieLens dataset, which includes:
- Over 1 million movie ratings
- 6,000+ users
- 3,000+ movies
- Metadata for each movie (title, genre, plot)
- Implements a hierarchical Bayesian model (CDL)
- Combines deep learning with collaborative filtering
- Addresses cold start and data sparsity problems
- Utilizes both implicit feedback and content information
The project is organized into Jupyter Notebook (.ipynb) files:
data_preprocessing.ipynb
: Data cleaning and preparationcdl_model.ipynb
: Implementation of the CDL modelevaluation.ipynb
: Model evaluation and results analysis
-
Clone this repository
-
Install required dependencies. pip install requirements
-
Download the MovieLens dataset and place it in the
data/
directory
Open and run the Jupyter Notebooks in the following order:
data_preprocessing.ipynb
cdl_model.ipynb
evaluation.ipynb
Our implementation achieved a recall of 0.33 for top 300 recommendations.
- Explore more diverse data sources
- Implement advanced deep learning architectures
- Replace bag-of-words with more sophisticated text representation techniques
- Wang, H., Wang, N., & Yeung, D. Y. (2015). Collaborative deep learning for recommender systems.
- Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles.
- Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets.
- Vincent, P., et al. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.