01-mnist.Rmd

---
title: "Introduction to Deep Learning in R - Part 1"
author: "D-Lab"
date: "11/24/2018 (updated: `r Sys.Date()`)"
output: html_document
---

## Introduction

[Review the slides to get onboarded](https://dlab-berkeley.github.io/Deep-Learning-in-R/slides.html#1)

## Install packages

Run this chunk manually to install once. Note the "eval=FALSE" option in the settings for this block. This means that the block will not be run when one clicks "knit" or "run all".

```{r install, eval=FALSE}
# Install keras package from CRAN.
install.packages("keras")

# If the most recent version is desired:
# devtools::install_github("rstudio/keras")

# Then run install_keras() function to install anaconda python, tensorflow, and keras.
keras::install_keras()
# Or one of these (for Mac):
keras::install_keras(method = "virtualenv")
keras::install_keras(method = "conda")
# Or if you have a GPU and have followed these instructions:
# https://tensorflow.rstudio.com/tools/local_gpu.html
# NOTE: tensorflow seems to require CUDA 9.0 currently; 9.2 for example will not work.
# keras::install_keras(tensorflow = "gpu")
```

Confirm that Keras is installed successfully within python. This will return "TRUE" if R can detect the python installation correctly. If it returns FALSE you will need to edit your keras::install_keras() line until it returns "TRUE".

```{r confirm_keras}
# This function needs to return TRUE.
keras::is_keras_available()

# If it returns FALSE, review the Python setup that R has automatically detected.
reticulate::py_config()
tensorflow::tf_config()
```

Also install some other helper packages. Run this line manually if needed:

```{r install_helpers, eval=FALSE}
install.packages(c("cowplot", "dplyr", "ggplot2"))
```

## Load packages

Now that we have installed the necessary packages, library them so that we can use the available functions without specifying the package name every time.

```{r load_packages}
library(cowplot)
library(keras)
library(dplyr)
library(tensorflow)
library(ggplot2)

#reticulate::use_virtualenv("r-tensorflow")
# This should return TRUE
```

## MNIST handwritten digit example

Jump in! The first example will consist of a walkthrough of the Keras vignette [located here](https://cran.r-project.org/web/packages/keras/vignettes/getting_started.html).  

Let's look at 70,000 handwritten digit images from the [Modified National Institute of Standards and Technology database](https://en.wikipedia.org/wiki/MNIST_database) (MNIST).  

```{r load_data}
# This line requires a working internet connection to download the data the first time.
mnist = dataset_mnist()

# How are the data stored? 
str(mnist)
```

```{r assign_vars}
# Define our x and y variables for the training and test sets.
x_train = mnist$train$x
y_train = mnist$train$y
x_test = mnist$test$x
y_test = mnist$test$y

# Note the 3D array structure of the raw features
str(x_train)
str(x_test)
```

## Reshape, rescale

### Reshape
The `array_reshape` function allows us to reshape a three-dimensional array like those found in our `mnist` dataset into matrices. Our 28x28 pixel images will become arrays/vectors with length $28*28 = 784$. 

```{r reshape}
# In R, L denotes that the value is an integer, rather than a floating point value. 
height = 28L
width = 28L

# Reshape
x_train = array_reshape(x_train, c(nrow(x_train), height * width))
x_test = array_reshape(x_test, c(nrow(x_test), height * width))

# Check the new 2D feature dimensions
str(x_train)
str(x_test)
summary(x_train[, 500:550])
x_train[1, ]
```

### Rescale

Pixel values that range from between 0 (black end of the color spectrum) to 255 (white end of the color spectrum) are scaled to values between 0 and 1. This can help the neural network optimize its weights by removing the scale factor of the pixel intensities.

```{r rescale}
x_train = x_train / 255
x_test = x_test / 255

# Confirm the max is now no more than 1 for each pixel.
summary(x_train[, 500:520])

# Confirm it explicitly by finding the max of each pixel column, then taking the max of that vector.
max(apply(x_train, MARGIN = 2, max))
```

## Define the model architecture

Now we can define the model. We want to build a sequential stack of layers. The `units` parameter defines how many nodes (neurons) we should have in each layer. `input_shape` allows us to define the image dimensions in the initial input layer. The `activation` parameter allows us to pass in the name of an activation function as the argument.  

See `?keras_model_sequential` to learn more.

```{r}
model = keras_model_sequential() 

model %>%
  
  # INPUT LAYER + 1st HIDDEN LAYER.
  # layer_dense allows us to add a hidden layer.
  # The input_shape argument is what actually specifies the input layer; "units = " and "activation = " define the first hidden layer. 
  # Alternatively we could have a separate layer_input() before layer_dense().
  
  layer_dense(units = 64, activation = 'relu', input_shape = 784) %>% 
  
  layer_dropout(rate = 0.4) %>% 
  
  # HIDDEN LAYER  (2nd)
  layer_dense(units = 16, activation = 'relu') %>%
  
  layer_dropout(rate = 0.3) %>%
  
  # OUTPUT LAYER
  layer_dense(units = 10, activation = 'softmax')

summary(model)
```

## Compile model with loss function, optimizer, and metrics

Recall that loss and optimizer functions work in tandem to tell us how wrong our predictions are.  

We use "sparse_categorical_crossentropy" as our loss function since we are dealing with multiple classification (i.e. a categorical variable), and "optimizer_rmsprop()" as our optimizer because it might perform a little bit better than gradient descent with momentum. What does the `lr` parameter do? We also select "accuracy" as our metric to produce simple classification rates for our outcomes.

```{r model_optimizer}
model %>% compile(
  loss = 'sparse_categorical_crossentropy',
  # loss = "mean_squared_error",
  optimizer = optimizer_rmsprop(lr = 0.001),
  metrics = c('accuracy')
)
```

## Train and evaluate

Now we can train the model using `fit`! Here, we can just pass in our X and Y variables along with the other hyperparameters.  

Watch the model build epoch by epoch. An **epoch** is one iteration through all of the training data, which is accomplished through batches of 128 observations here.

```{r model_fit}
(history = model %>% fit(
  x_train, y_train, 
  epochs = 45, batch_size = 128, 
  validation_split = 0.2
))
```

### Interpreting the plot

**loss**: loss is the mean of the average loss across each batch of training data. We expect that the loss for earlier batches is higher than that for later batches because the model _should_ be learning over time. We hope that the later batches of data have lower losses.  

**acc**: is the training accuracy.  

**val_loss** and **val_acc** are the loss and accuracy for the test data. 

## Plot history via ggplot

```{r}
plot(history) + theme_minimal()
# ggsave()
```

## Evaluate performance on the test data

```{r model_eval}
# The model works pretty well! 
model %>% evaluate(x_test, y_test)

# Generate predictions on the test data, without explicit evaluation.
preds = model %>% predict(x_test)
dim(preds)
head(round(preds, 4))
glimpse(preds)
```

## Challenge

1. Change the number of units and dropout in the layers. Can you change them to get even better (or worse) predictive performance on the test set? 

2. Write down the steps you followed to run this model from start to finish. What does each part do?