forked from dlab-berkeley/R-Deep-Learning
-
Notifications
You must be signed in to change notification settings - Fork 0
/
01-mnist.Rmd
220 lines (155 loc) · 7.28 KB
/
01-mnist.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---
title: "Introduction to Deep Learning in R - Part 1"
author: "D-Lab"
date: "11/24/2018 (updated: `r Sys.Date()`)"
output: html_document
---
## Introduction
[Review the slides to get onboarded](https://dlab-berkeley.github.io/Deep-Learning-in-R/slides.html#1)
## Install packages
Run this chunk manually to install once. Note the "eval=FALSE" option in the settings for this block. This means that the block will not be run when one clicks "knit" or "run all".
```{r install, eval=FALSE}
# Install keras package from CRAN.
install.packages("keras")
# If the most recent version is desired:
# devtools::install_github("rstudio/keras")
# Then run install_keras() function to install anaconda python, tensorflow, and keras.
keras::install_keras()
# Or one of these (for Mac):
keras::install_keras(method = "virtualenv")
keras::install_keras(method = "conda")
# Or if you have a GPU and have followed these instructions:
# https://tensorflow.rstudio.com/tools/local_gpu.html
# NOTE: tensorflow seems to require CUDA 9.0 currently; 9.2 for example will not work.
# keras::install_keras(tensorflow = "gpu")
```
Confirm that Keras is installed successfully within python. This will return "TRUE" if R can detect the python installation correctly. If it returns FALSE you will need to edit your keras::install_keras() line until it returns "TRUE".
```{r confirm_keras}
# This function needs to return TRUE.
keras::is_keras_available()
# If it returns FALSE, review the Python setup that R has automatically detected.
reticulate::py_config()
tensorflow::tf_config()
```
Also install some other helper packages. Run this line manually if needed:
```{r install_helpers, eval=FALSE}
install.packages(c("cowplot", "dplyr", "ggplot2"))
```
## Load packages
Now that we have installed the necessary packages, library them so that we can use the available functions without specifying the package name every time.
```{r load_packages}
library(cowplot)
library(keras)
library(dplyr)
library(tensorflow)
library(ggplot2)
#reticulate::use_virtualenv("r-tensorflow")
# This should return TRUE
```
## MNIST handwritten digit example
Jump in! The first example will consist of a walkthrough of the Keras vignette [located here](https://cran.r-project.org/web/packages/keras/vignettes/getting_started.html).
Let's look at 70,000 handwritten digit images from the [Modified National Institute of Standards and Technology database](https://en.wikipedia.org/wiki/MNIST_database) (MNIST).
```{r load_data}
# This line requires a working internet connection to download the data the first time.
mnist = dataset_mnist()
# How are the data stored?
str(mnist)
```
```{r assign_vars}
# Define our x and y variables for the training and test sets.
x_train = mnist$train$x
y_train = mnist$train$y
x_test = mnist$test$x
y_test = mnist$test$y
# Note the 3D array structure of the raw features
str(x_train)
str(x_test)
```
## Reshape, rescale
### Reshape
The `array_reshape` function allows us to reshape a three-dimensional array like those found in our `mnist` dataset into matrices. Our 28x28 pixel images will become arrays/vectors with length $28*28 = 784$.
```{r reshape}
# In R, L denotes that the value is an integer, rather than a floating point value.
height = 28L
width = 28L
# Reshape
x_train = array_reshape(x_train, c(nrow(x_train), height * width))
x_test = array_reshape(x_test, c(nrow(x_test), height * width))
# Check the new 2D feature dimensions
str(x_train)
str(x_test)
summary(x_train[, 500:550])
x_train[1, ]
```
### Rescale
Pixel values that range from between 0 (black end of the color spectrum) to 255 (white end of the color spectrum) are scaled to values between 0 and 1. This can help the neural network optimize its weights by removing the scale factor of the pixel intensities.
```{r rescale}
x_train = x_train / 255
x_test = x_test / 255
# Confirm the max is now no more than 1 for each pixel.
summary(x_train[, 500:520])
# Confirm it explicitly by finding the max of each pixel column, then taking the max of that vector.
max(apply(x_train, MARGIN = 2, max))
```
## Define the model architecture
Now we can define the model. We want to build a sequential stack of layers. The `units` parameter defines how many nodes (neurons) we should have in each layer. `input_shape` allows us to define the image dimensions in the initial input layer. The `activation` parameter allows us to pass in the name of an activation function as the argument.
See `?keras_model_sequential` to learn more.
```{r}
model = keras_model_sequential()
model %>%
# INPUT LAYER + 1st HIDDEN LAYER.
# layer_dense allows us to add a hidden layer.
# The input_shape argument is what actually specifies the input layer; "units = " and "activation = " define the first hidden layer.
# Alternatively we could have a separate layer_input() before layer_dense().
layer_dense(units = 64, activation = 'relu', input_shape = 784) %>%
layer_dropout(rate = 0.4) %>%
# HIDDEN LAYER (2nd)
layer_dense(units = 16, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
# OUTPUT LAYER
layer_dense(units = 10, activation = 'softmax')
summary(model)
```
## Compile model with loss function, optimizer, and metrics
Recall that loss and optimizer functions work in tandem to tell us how wrong our predictions are.
We use "sparse_categorical_crossentropy" as our loss function since we are dealing with multiple classification (i.e. a categorical variable), and "optimizer_rmsprop()" as our optimizer because it might perform a little bit better than gradient descent with momentum. What does the `lr` parameter do? We also select "accuracy" as our metric to produce simple classification rates for our outcomes.
```{r model_optimizer}
model %>% compile(
loss = 'sparse_categorical_crossentropy',
# loss = "mean_squared_error",
optimizer = optimizer_rmsprop(lr = 0.001),
metrics = c('accuracy')
)
```
## Train and evaluate
Now we can train the model using `fit`! Here, we can just pass in our X and Y variables along with the other hyperparameters.
Watch the model build epoch by epoch. An **epoch** is one iteration through all of the training data, which is accomplished through batches of 128 observations here.
```{r model_fit}
(history = model %>% fit(
x_train, y_train,
epochs = 45, batch_size = 128,
validation_split = 0.2
))
```
### Interpreting the plot
**loss**: loss is the mean of the average loss across each batch of training data. We expect that the loss for earlier batches is higher than that for later batches because the model _should_ be learning over time. We hope that the later batches of data have lower losses.
**acc**: is the training accuracy.
**val_loss** and **val_acc** are the loss and accuracy for the test data.
## Plot history via ggplot
```{r}
plot(history) + theme_minimal()
# ggsave()
```
## Evaluate performance on the test data
```{r model_eval}
# The model works pretty well!
model %>% evaluate(x_test, y_test)
# Generate predictions on the test data, without explicit evaluation.
preds = model %>% predict(x_test)
dim(preds)
head(round(preds, 4))
glimpse(preds)
```
## Challenge
1. Change the number of units and dropout in the layers. Can you change them to get even better (or worse) predictive performance on the test set?
2. Write down the steps you followed to run this model from start to finish. What does each part do?