Skip to content

chapter 6.1 The Bidirectional LSTM

Butcher Yang edited this page Jan 1, 2018 · 1 revision

10.1.1 Architecture

We have seen the benet of reversing the order of input sequences for LSTMs discussed in the introduction of Encoder-Decoder LSTMs.

We were surprised by the extent of the improvement obtained by reversing the words in the source sentences.| Sequence to Sequence Learning with Neural Networks, 2014.

Bidirectional LSTMs focus on the problem of getting the most out of the input sequence by stepping through input time steps in both the forward and backward directions. In practice, this architecture involves duplicating the rst recurrent layer in the network so that there are now two layers side-by-side, then providing the input sequence as-is as input to the rst layer and providing a reversed copy of the input sequence to the second. This approach was developed some time ago as a general approach for improving the performance of Recurrent Neural Networks (RNNs).

To overcome the limitations of a regular RNN ... we propose a bidirectional recurrent neural network (BRNN) that can be trained using all available input information in the past and future of a specic time frame. ... The idea is to split the state neurons of a regular RNN in a part that is responsible for the positive time direction (forward states) and a part for the negative time direction (backward states)| Bidirectional Recurrent Neural Networks, 1997.

This approach has been used to great eect with LSTM Recurrent Neural Networks. Pro- viding the entire sequence both forwards and backwards is based on the assumption that the whole sequence is available. This is generally a requirement in practice when using vectorized inputs. Nevertheless, it may raise a philosophical concern where ideally time steps are provided in order and just-in-time. The use of providing an input sequence bi-directionally was justied in the domain of speech recognition because there is evidence that in humans, the context of the whole utterance is used to interpret what is being said rather than a linear interpretation.

... relying on knowledge of the future seems at rst sight to violate causality. How can we base our understanding of what we've heard on something that hasn't been said yet? However, human listeners do exactly that. Sounds, words, and even whole sentences that at rst mean nothing are found to make sense in the light of future context. What we must remember is the distinction between tasks that are truly online - requiring an output after every input - and those where outputs are only needed at the end of some input segment.| Framewise Phoneme Classication with Bidirectional LSTM and Other Neural Network Architectures, 2005.

Although Bidirectional LSTMs were developed for speech recognition, the use of Bidirectional input sequences is now a staple of sequence prediction with LSTMs as an approach for lifting model performance.

10.1.2 Implementation

The LSTM layer in Keras allow you to specify the directionality of the input sequence. This can be done by setting the go backwards argument to True (defaults to False).

model = Sequential()
model.add(LSTM(..., input_shape=(...), go_backwards=True))
...
Listing 10.1: Example of a Vanilla LSTM model with backward input sequences.

Bidirectional LSTMs are a small step on top of this capability. Specically, Bidirectional LSTMs are supported in Keras via the Bidirectional layer wrapper that essentially merges the output from two parallel LSTMs, one with input processed forward and one with output processed backwards. This wrapper takes a recurrent layer (e.g. the rst hidden LSTM layer) as an argument.

model = Sequential()
model.add(Bidirectional(LSTM(...), input_shape=(...)))
...
Listing 10.2: Example of a Bidirectional wrapped LSTM layer.

The Bidirectional wrapper layer also allows you to specify the merge mode; that is how the forward and backward outputs should be combined before being passed on to the next layer. The options are:

  • `sum': The outputs are added together.
  • `mul': The outputs are multiplied together.
  • `concat': The outputs are concatenated together (the default), providing double the number of outputs to the next layer.
  • `ave': The average of the outputs is taken.

The default mode is to concatenate, and this is the method often used in studies of bidirectional LSTMs. In general, it might be a good idea to test each of the merge modes on your problem to see if you can improve upon the concatenate default option.

10.2 Cumulative Sum Prediction Problem

We will dene a simple sequence classication problem to explore bidirectional LSTMs called the cumulative sum prediction problem. This section is divided into the following parts:

  1. Cumulative Sum.
  2. Sequence Generation.
  3. Generate Multiple Sequences.

#10.2.1 Cumulative Sum

The problem is defined as a sequence of random values between 0 and 1. This sequence is taken as input for the problem with each number provided once per time step. A binary label (0 or 1) is associated with each input. The output values are all 0. Once the cumulative sum of the input values in the sequence exceeds a threshold, then the output value ips from 0 to 1. A threshold of one quarter ( 1 /4 ) the sequence length is used. For example, below is a sequence of 10 input time steps (X):

0.63144003 0.29414551 0.91587952 0.95189228 0.32195638 0.60742236 0.83895793 0.18023048
0.84762691 0.29165514
Listing 10.3: Example input sequence of random real values.

The corresponding classication output (y) would be:

0 0 0 1 1 1 1 1 1 1
Listing 10.4: Example output sequence of cumulative sum values.

We will frame the problem to make the best use of the Bidirectional LSTM architecture. The output sequence will be produced after the entire input sequence has been fed into the model. Technically, this means this is a sequence-to-sequence prediction problem that requires a many-to-many prediction model. It is also the case that the input and output sequences have the same number of time steps (length).

10.2.2 Sequence Generation

We can implement this in Python. The rst step is to generate a sequence of random values. We can use the random() function from the random module.

# create a sequence of random numbers in [0,1]
X = array([random() for _ in range(10)])
Listing 10.5: Example creating an input sequence of random real values.

We can dene the threshold as one-quarter the length of the input sequence.

# calculate cut-off value to change class values
limit = 10/4.0
Listing 10.6: Example of calculating the cumulative sum threshold.

The cumulative sum of the input sequence can be calculated using the cumsum() NumPy function. This function returns a sequence of cumulative sum values, e.g.:

pos1, pos1+pos2, pos1+pos2+pos3, ...
Listing 10.7: Example of calculating a cumulative sum output sequence.

We can then calculate the output sequence as to whether each cumulative sum value exceeded the threshold.

# determine the class outcome for each item in cumulative sequence
y = array([0 if x < limit else 1 for x in cumsum(X)])
Listing 10.8: Example of implementing the calculating of the cumulative sum threshold.

The function below, named get sequence(), draws all of this together, taking as input the length of the sequence, and returns the X and y components of a new problem case.

# create a sequence classification instance
def get_sequence(n_timesteps):
# create a sequence of random numbers in [0,1]
X = array([random() for _ in range(n_timesteps)])
# calculate cut-off value to change class values
limit = n_timesteps/4.0
# determine the class outcome for each item in cumulative sequence
y = array([0 if x < limit else 1 for x in cumsum(X)])
return X, y
Listing 10.9: Function to create a random input and output sequence.

We can test this function with a new 10-step sequence as follows:

from random import random
from numpy import array
from numpy import cumsum
# create a cumulative sum sequence
def get_sequence(n_timesteps):
# create a sequence of random numbers in [0,1]
X = array([random() for _ in range(n_timesteps)])
# calculate cut-off value to change class values
limit = n_timesteps/4.0
# determine the class outcome for each item in cumulative sequence
y = array([0 if x < limit else 1 for x in cumsum(X)])
return X, y
X, y = get_sequence(10)
print(X)
print(y)
Listing 10.10: Example of generating a random input and output sequence.

Running the example rst prints the generated input sequence followed by the matching output sequence.

[ 0.22228819 0.26882207 0.069623 0.91477783 0.02095862 0.71322527
0.90159654 0.65000306 0.88845226 0.4037031 ]
[0 0 0 0 0 0 1 1 1 1]
Listing 10.11: Example output from generating a random input and output sequence.
Clone this wiki locally