Notebooks:
-
Optimization for Deep Learning Highlights in 2017 by Sebastian Ruder (researcher, not USF student)
- this blog covers SGD, ADAM, weight decays 🔴 (read it!)
-
Deep Learning #4: Why You Need to Start Using Embedding Layers
- our penultimate lesson
- we can compress high dimensional spaces to a few dimensions, using PCA (Principal Component Analysis)
- PCA is a linear technique
- Rachel's computational linear algebra covers PCA
- PCA similar to SVD (singular value decomposition)
- find 3 linear combinations of the 50 dimensions which capture as much of the variation as possible, but different from each other
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
movie_pca = pca.fit(movie_emb.T).components_
- can give folks at work random forest with embeddings without using neural networks
- you can train a neural net with embeddings; everyone else in organization can chuck that into GBM or random forests or KNN
- can give power of neural nets to everyone in organization without everyone having to do fastai table
- embedding can be in SQL table
- GBM and random forests learn a lot quicker than neural nets do
- visualizing embeddings can be interesting
- first, see things you expect to see
- then, try seeing things that were not expected (some clusterings)
- Q: skipgrams, a type of embedding?
- A: skipgrams for NLP
- say we have an unlabeled dataset, such as Google Books
- the best way, in my opinion to turn an unlabeled (or unsupervised) problem into a labeled problem is to invent some labels
- what they did in Word2vec is: here's a sentence with 11 words in it: _ _ _ _ _ _ _ _ _ _ _
- let's delete the middle word and replace it with a random word
- example: replace "cat" with "justice"
- sentence: the cute little CAT sat on the fuzzy mat ---> assign label = 1
- sentence: the cute little JUSTICE sat on the fuzzy mat ---> assign label = 0
- ! now we have something we can build a machine learning model on
- quick, shallow learning, end up with embeddings with linear characteristics
- for something more predictive, use neural net
- we need to move past Word2Vec and GLoVe, these linear based methods; these embeddings are way less predictive than with embeddings learned with deep models
- nowadays, unsupervised learning is really fake task labeled learning
- we need something where the type of relationships it's going to learn are the types we care about.
- in computer vision, let's take an image and use an unusal data augmentation, such as recolor them too much, and ask neural net to predict augmented and non-augmented image
- use the best fake task you can
- a bad "fake task" is an auto-encoder; reconstruct my input using neural net with some activations deleted; most uncreative task, but it works surprisingly well
- we may cover this unsupervised learning in Part 2, if there is interest
41:00
back to Rossman notebook:
- https://github.com/fastai/fastai/blob/master/courses/dl1/lesson3-rossman.ipynb
- lot of details of this notebook are covered in the ML course
- shallow learning means it doesn't have a hidden layer
https://github.com/fastai/fastai/blob/master/courses/dl1/lesson6-sgd.ipynb
- Machine Learning course - building stuff up from the foundations
- Deep Learning course - best practices, top down
- Lessons 9, 10, 11 of ML course: create a neural net layer from scratch