Repository for projects developed in EPFL Applied Data Science: Machine Learning Program.
Short description: Investigate relationships between the different criteria that influence happiness.
Tools and technologies: Data analysis, data visualization, Python.
Short description: Data mining/data visualization to help users decode food labels, and make better food choices in general.
Tool and technologies: Data analysis, data visualization, Python, SQLLite.
Short description: House prices prediction on the data set assembled and published by Dean De Cock.
Tools and technologies: Feature encoding, feature engineering, regularization, grid search model tuning.
Short description: Use various machine learning techniques for image classification such as Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Decision Trees, Random Forests and Convolutional Neural Networks (CNN).
Tools and technologies: SVM, KNN, Decision Trees, Random Forests, CNN.
Short description: Explore different feature engineering/ different vectorizing models such as TF-IDF, Word2Vec, or pretrained embeddings such as GloVe and FastText. In this project I have built models on the datasets with different feature sets prepared in the earlier sections, using the following algorithms: Logistic Regression (baseline), Naïve Bayes, XGBoost (Extreme Gradient Boosting) and Convolutional Neural Networks (CNNs). Furthermore the project explored different techniques to handle imbalanced data such as SMOTE.
Tool and technologies: NLP, SMOTE, Word Embeddings, TF-IDF, CNN.
I use this for my own projects, I know this might not be the perfect approach for all the projects out there. If you have any ideas, just [open an issue][issues] and tell me what you think.
If you'd like to contribute, please fork the repository and make changes as you'd like. Pull requests are warmly welcome.
Distributed under the GPL License. See LICENSE
for more information.