-
Notifications
You must be signed in to change notification settings - Fork 0
Chapter 4.2 How to Develop Stacked LSTMs
The goal of this lesson is to learn how to develop and evaluate stacked LSTM models. After completing this lesson, you will know:
- The motivation for creating a multilayer LSTM and how to develop Stacked LSTM models in Keras.
- The damped sine wave prediction problem and how to prepare examples for tting LSTM models.
- How to develop, t, and evaluate a Stacked LSTM model for the damped sine wave prediction problem.
This lesson is divided into 7 parts; they are:
- The Stacked LSTM.
- Damped Sine Wave Prediction Problem.
- Define and Compile the Model.
- Fit the Model.
- Evaluate the Model.
- Make Predictions With the Model.
- Complete Example.
The Stacked LSTM is a model that has multiple hidden LSTM layers where each layer contains multiple memory cells. We will refer to it as a Stacked LSTM here to dierentiate it from the unstacked LSTM (Vanilla LSTM) and a variety of other extensions to the basic LSTM model.
7.1.1 Why Increase Depth?
Stacking LSTM hidden layers makes the model deeper, more accurately earning the description as a deep learning technique. It is the depth of neural networks that is generally attributed to the success of the approach on a wide range of challenging prediction problems. [the success of deep neural networks] is commonly attributed to the hierarchy that is introduced due to the several layers. Each layer processes some part of the task we wish to solve, and passes it on to the next. In this sense, the DNN can be seen as a processing pipeline, in which each layer solves a part of the task before passing it on to the next, until nally the last layer provides the output.| Training and Analyzing Deep Recurrent Neural Networks, 2013
Additional hidden layers can be added to a Multilayer Perceptron neural network to make it deeper. The additional hidden layers are understood to recombine the learned representation from prior layers and create new representations at high levels of abstraction. For example, from lines to shapes to objects.
A sufficiently large single hidden layer Multilayer Perceptron can be used to approximate most functions. Increasing the depth of the network provides an alternate solution that requires fewer neurons and trains faster. Ultimately, adding depth it is a type of representational optimization.
Deep learning is built around a hypothesis that a deep, hierarchical model can be exponentially more ecient at representing some functions than a shallow one.| How to Construct Deep Recurrent Neural Networks, 2013