- Machine Learning
- Batch Learning vs Online Learning
- Instance Based Learning vs Model Based Learning
- Supervised, Unsupervised, Semi-supervised, reinforcement learning
- Machine Learning Algorithms
- Algorithm
- Model Training
- Model Selection
- Machine Learning Implementation
- 1-million users to train K-means
- local sensitive hashing
- provide recommendations in real-time
- 1-million users to train K-means
- Data Analysis and Metrics
- define user-item scores when user "like" data is hard to get
For take-home exercise, often use Jupyter Notebook or R-Markdown. From very open-ended to very detailed instructions(accuracy score expected)
- Problem
- understand the problem
- key challenge and necessary domain knowledges
- problem formalization
- understand the problem
- Data
- Data collection
- Exploratory Data Analysis - understand the data
- occasionally: deal with big data
- either drop part of the data or use more advanced (parallel) platforms
- summarize descriptive analysis
- occasionally: deal with big data
- Data Processing
- Data Wrangling, Data Cleaning
- Missing Value handling and impact analysis
- Feature Matrix
- Understanding features
- deal with categorical features (encoding)
- Feature Engineering
- eg. tokenizing, stemming, word2vec ,TF-IDF
- Feature Selection
- Understanding features
- Modelling
- Pre-proessing before model training
- Optimize metrics
- hyper parameters tuning
- model evaluation and model selection
- Classification vs Regression
- model specific features
- eg. feature importance from tree-based models
- L1, L2 regularization
- Evaluation
- Model Evaluation and Selection(Offline)
- A/B Testing(Online)
- Business Value/Summary
- Business Case Analysis
- The most important part
- Model Deployment