-
Notifications
You must be signed in to change notification settings - Fork 0
chapter 5.1 The Encoder Decoder LSTM
Sequence prediction often involves forecasting the next value in a real valued sequence or outputting a class label for an input sequence. This is often framed as a sequence of one input time step to one output time step (e.g. one-to-one) or multiple input time steps to one output time step (many-to-one) type sequence prediction problem.
There is a more challenging type of sequence prediction problem that takes a sequence as input and requires a sequence prediction as output. These are called sequence-to-sequence prediction problems, or seq2seq for short. One modeling concern that makes these problems challenging is that the length of the input and output sequences may vary. Given that there are multiple input time steps and multiple output time steps, this form of problem is referred to as many-to-many type sequence prediction problem.
One approach to seq2seq prediction problems that has proven very eective is called the Encoder- Decoder LSTM. This architecture is comprised of two models: one for reading the input sequence and encoding it into a xed-length vector, and a second for decoding the xed-length vector and outputting the predicted sequence. The use of the models in concert gives the architecture its name of Encoder-Decoder LSTM designed specically for seq2seq problems.
... RNN Encoder-Decoder, consists of two recurrent neural networks (RNN) that act as an encoder and a decoder pair. The encoder maps a variable-length source sequence to a xed-length vector, and the decoder maps the vector representation back to a variable-length target sequence.| Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, 2014.
The Encoder-Decoder LSTM was developed for natural language processing problems where it demonstrated state-of-the-art performance, specically in the area of text translation called statistical machine translation. The innovation of this architecture is the use of a xed-sized internal representation in the heart of the model that input sequences are read to and output sequences are read from. For this reason, the method may be referred to as sequence embedding. In one of the rst applications of the architecture to English-to-French translation, the internal representation of the encoded English phrases was visualized. The plots revealed a qualitatively meaningful learned structure of the phrases harnessed for the translation task.
The proposed RNN Encoder-Decoder naturally generates a continuous-space rep- resentation of a phrase. [...] From the visualization, it is clear that the RNN Encoder-Decoder captures both semantic and syntactic structures of the phrases| Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, 2014.
On the task of translation, the model was found to be more eective when the input sequence was reversed. Further, the model was shown to be eective even on very long input sequences.
We were able to do well on long sentences because we reversed the order of words in the source sentence but not the target sentences in the training and test set. By doing so, we introduced many short term dependencies that made the optimization problem much simpler. ... The simple trick of reversing the words in the source sentence is one of the key technical contributions of this work | Sequence to Sequence Learning with Neural Networks, 2014
This approach has also been used with image inputs where a Convolutional Neural Network is used as a feature extractor on input images, which is then read by a decoder LSTM.
... we propose to follow this elegant recipe, replacing the encoder RNN by a deep convolution neural network (CNN). [...] it is natural to use a CNN as an image \encoder", by rst pre-training it for an image classication task and using the last hidden layer as an input to the RNN decoder that generates sentences | Show and Tell: A Neural Image Caption Generator, 2014.
The list below highlights some interesting applications of the Encoder-Decoder LSTM architec- ture.
- Machine Translation, e.g. English to French translation of phrases.
- Learning to Execute, e.g. calculate the outcome of small programs.
- Image Captioning, e.g. generating a text description for images.
- Conversational Modeling, e.g. generating answers to textual questions.
- Movement Classication, e.g. generating a sequence of commands from a sequence of gestures.