Skip to content

Repository for projects from EPFL's Applied Data Science Machine Learning program

License

Notifications You must be signed in to change notification settings

lauravoicu/EPFL-Applied-Data-Science-Machine-Learning

Repository files navigation

EPFL Applied Data Science: Machine Learning

License: GPL v3 Linkedin Badge

Repository for projects developed in EPFL Applied Data Science: Machine Learning Program.

Project 1: Word Happiness Report

Short description: Investigate relationships between the different criteria that influence happiness.

Tools and technologies: Data analysis, data visualization, Python.

Project 2: Open Food Facts

Short description: Data mining/data visualization to help users decode food labels, and make better food choices in general.

Tool and technologies: Data analysis, data visualization, Python, SQLLite.

Project 3: House Prices Prediction

Short description: House prices prediction on the data set assembled and published by Dean De Cock.

Tools and technologies: Feature encoding, feature engineering, regularization, grid search model tuning.

Project 4: Image Classifier

Short description: Use various machine learning techniques for image classification such as Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Decision Trees, Random Forests and Convolutional Neural Networks (CNN).

Tools and technologies: SVM, KNN, Decision Trees, Random Forests, CNN.

Capstone Project: Cyberbullying Prediction in Social Media

Short description: Explore different feature engineering/ different vectorizing models such as TF-IDF, Word2Vec, or pretrained embeddings such as GloVe and FastText. In this project I have built models on the datasets with different feature sets prepared in the earlier sections, using the following algorithms: Logistic Regression (baseline), Naïve Bayes, XGBoost (Extreme Gradient Boosting) and Convolutional Neural Networks (CNNs). Furthermore the project explored different techniques to handle imbalanced data such as SMOTE.

Tool and technologies: NLP, SMOTE, Word Embeddings, TF-IDF, CNN.

Contributing

I use this for my own projects, I know this might not be the perfect approach for all the projects out there. If you have any ideas, just [open an issue][issues] and tell me what you think.

If you'd like to contribute, please fork the repository and make changes as you'd like. Pull requests are warmly welcome.

License

Distributed under the GPL License. See LICENSE for more information.