live 11-Dec-2017
Notebooks:
- WILD ML RNN Tutorial - http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
- Chris Olah on LSTM http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- More from Olah and others - https://distill.pub/
- BatchNorm paper
- Laptop recommendation; Surface Book 2 15 inch
- classification and regression with deep learning
- identifying best practices
- here are 3 lines of code for image classification
- first 4 lessons were NLP, structured data, collaborative filtering
- last 3 lessons were above topics in more detail, more detailed code
- generative modeling
- creating a sentence, image captioning, neural translation
- creating an image, style transfer
- moving from best practices to speculative practices
- how to read a paper and implement from scratch
- does not assume a particular math background, but be prepared to dig through notation and convert to code
- not so different
- they are like a fully connected network
bs=64
means data is split into 65 chunks of data.
NOT batches of size 64!
- JH can't talk about that; doesn't know a good way
- JH will do further study on that
- well-known dataset in academia: https://www.cs.toronto.edu/~kriz/cifar.html
- small datasets are much more interesting than ImageNet
- often, we're looking at 32x32 pixels (example: lung cancer image)
- often, it's more challenging, and more interesting
- we can run algorithms much more quickly, and it's still challenging
- you can get the data by:
wget http://pjreddie.com/media/files/cifar.tgz
(provided in form we need) - this is mean, SD per channel; try to replicate on your own
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
stats = (np.array([ 0.4914 , 0.48216, 0.44653]), np.array([ 0.24703, 0.24349, 0.26159]))
- Kerem's notebook on how different optimizers work: https://github.com/KeremTurgutlu/deeplearning/blob/master/Exploring%20Optimizers.ipynb
- to improve model, we'll next replace our fully connected model (with 1 hidden layer) with a CNN
nn.Conv2d(layers[i], layers[i + 1], kernel_size=3, stride=2)
layers[i]
number of features coming inlayers[i + 1]
number of features coming outstride=2
is a "stride 2 convolution"- it has similar effect to
maxpooling
; reduces the size of the layers
self.pool = nn.AdaptiveMaxPool2d(1)
- standard now for state-of-the-art algorithms
- I'm not going to tell you how big an area to pool, I will tell you how big a resolution to create
- starting with 28x28: Do a 14x14 adaptive maxpool; same as 2x2 maxpool with a 14x14 output
- a couple of years old now
- makes it easier to train deeper networks
- assumes you have mastered all techniques introdued in Part 1
- has same level of intensity as Part 1
- people who did well in Part 2 last year watched each of the videos at least 3 times
- make sure you get to the point where you can recreate the notebooks without watching the videos
- try and recreate the notebooks using different datasets
- keep up with the forum; recent papers, advances
- you'll find less of it is mysterious; makes more sense; there will always be stuff you don't understand
- Lessons 1 and 2 of Part 1 may seem trivial
- people who succeed are those who keep working at it
- hope to see you all in March
- see you in the Forum