This is a project developed during the course we took in Data Mining and Text Mining on our first year of Master degree.
We were given a huge dataset with information about sales and customers for hundreds of different store distributed across Europe. Each entry consisted of more than 50 features. and there were approximately 550K entries for the training set.
The goal was to predict number of customers and sales for all these shops over a timespan of two months. The traing set represented history for the past 2 years.
The model consist of a bagging of decision regression trees. Please note that with Feed Forward Neural Networks we achieved higher results, but we were not allowed to use them and so decided to keep them just as comparative models.
See the short presentation of the project and the paper for further details.
The company who gave us the dataset, Business Integration Partners (Bip.), did not allow us to share the dataset, but here you can find the whole implementation and preprocessing.