Lesson 7

live 11-Dec-2017

Wiki: Lesson 7

Notebooks:

lesson6-rnn.ipynb
lesson7-cifar10.ipynb
lesson7-CAM.ipynb

Theme of Part 1

classification and regression with deep learning
identifying best practices
here are 3 lines of code for image classification
first 4 lessons were NLP, structured data, collaborative filtering
last 3 lessons were above topics in more detail, more detailed code

Theme of Part 2

generative modeling
creating a sentence, image captioning, neural translation
creating an image, style transfer
moving from best practices to speculative practices
how to read a paper and implement from scratch
does not assume a particular math background, but be prepared to dig through notation and convert to code

RNN

not so different
they are like a fully connected network

Batch Size

bs=64 means data is split into 65 chunks of data.
NOT batches of size 64!

Data Augmentation for NLP

JH can't talk about that; doesn't know a good way
JH will do further study on that

CIFAR 10

well-known dataset in academia: https://www.cs.toronto.edu/~kriz/cifar.html
small datasets are much more interesting than ImageNet
often, we're looking at 32x32 pixels (example: lung cancer image)
often, it's more challenging, and more interesting
we can run algorithms much more quickly, and it's still challenging
you can get the data by: wget http://pjreddie.com/media/files/cifar.tgz (provided in form we need)
this is mean, SD per channel; try to replicate on your own

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
stats = (np.array([ 0.4914 ,  0.48216,  0.44653]), np.array([ 0.24703,  0.24349,  0.26159]))

Kerem's notebook on how different optimizers work: https://github.com/KeremTurgutlu/deeplearning/blob/master/Exploring%20Optimizers.ipynb
to improve model, we'll next replace our fully connected model (with 1 hidden layer) with a CNN
nn.Conv2d(layers[i], layers[i + 1], kernel_size=3, stride=2)
- layers[i] number of features coming in
- layers[i + 1] number of features coming out
- stride=2 is a "stride 2 convolution"
- it has similar effect to maxpooling; reduces the size of the layers
self.pool = nn.AdaptiveMaxPool2d(1)
- standard now for state-of-the-art algorithms
- I'm not going to tell you how big an area to pool, I will tell you how big a resolution to create
- starting with 28x28: Do a 14x14 adaptive maxpool; same as 2x2 maxpool with a 14x14 output

BatchNorm (Batch Normalization)

a couple of years old now
makes it easier to train deeper networks

Getting Ready for Part 2

assumes you have mastered all techniques introdued in Part 1
has same level of intensity as Part 1
people who did well in Part 2 last year watched each of the videos at least 3 times
make sure you get to the point where you can recreate the notebooks without watching the videos
try and recreate the notebooks using different datasets
keep up with the forum; recent papers, advances
you'll find less of it is mysterious; makes more sense; there will always be stuff you don't understand
Lessons 1 and 2 of Part 1 may seem trivial
people who succeed are those who keep working at it
hope to see you all in March
see you in the Forum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lesson_7_x.md

lesson_7_x.md

Lesson 7

Other links

Theme of Part 1

Theme of Part 2

RNN

Batch Size

Data Augmentation for NLP

CIFAR 10

BatchNorm (Batch Normalization)

Getting Ready for Part 2

Files

lesson_7_x.md

Latest commit

History

lesson_7_x.md

File metadata and controls

Lesson 7

Other links

Theme of Part 1

Theme of Part 2

RNN

Batch Size

Data Augmentation for NLP

CIFAR 10

BatchNorm (Batch Normalization)

Getting Ready for Part 2