In this project, we trained a variational autoencoder (VAE) for generating MNIST digits. VAEs are a powerful type of generative model that can learn to represent and generate data by encoding it into a latent space and decoding it back into the original space. For a detailed explanation of VAEs, see Auto-Encoding Variational Bayes by Kingma & Welling.
The encoder consists of two fully-connected layers, the first has 14×14 = 196 inputs and 128 outputs (and tanh nonlinearity) and the second layer has 128 inputs and outputs (and no nonlinearity). Therefore, the latent factor z ∈ R (8 output neurons for the mean and 8 more for the standard deviation). The decoder takes in as input z ∈ R, pushes it through one layer with 128 outputs (and tanh nonlinearity) and then another layer with 196 output neurons (with sigmoid nonlinearity).
Let the parameters of the encoder be u and the parameters of the decoder be v. We maximize the objective
We model pu(z | x) as a Gaussian distribution, and the neural network predicts the mean µu(x) and the diagonal of the covariance σu(x). The KL-divergence is calculated using the following equation:
We use a Bernoulli distribution to model pv(x | z) because we are using the binarized MNIST dataset, with the following equation:
The auto-encoder is trained using the re-parametrization trick for computing the gradient of the u and standard back-prop for computing the gradient of the log pv(x | z).
- Open the file VAE_Digit_Recognition.ipynb in Google Colab, Jupyter Notebook or other code editor that supports ipynb files.
- Run the cells sequentially
We achieved impressive results with our VAE, as demonstrated by the following plots: