Skip to content

Chapter 7.3.1 Finalize a LSTM Model

Butcher Yang edited this page Jan 2, 2018 · 1 revision

13.1 Finalize a LSTM Model

In this section, you will discover how to Finalize your LSTM model.

13.1.1 What Is a Final LSTM Model?

A final LSTM model is one that you use to make predictions on new data. That is, given new examples of input data, you want to use the model to predict the expected output. This may be a classication (assign a label) or a regression (a real value). The goal of your sequence prediction project is to arrive at a nal model that performs the best, where best is dened by:

  • Data: the historical data that you have available.
  • Time: the time you have to spend on the project.
  • Procedure: the data preparation steps, algorithm or algorithms, and the chosen algorithm congurations.

In your project, you gather the data, spend the time you have, and discover the data preparation procedures, algorithm to use, and how to congure it. The nal model is the pinnacle of this process, the end you seek in order to start actually making predictions. There is no such thing as a perfect model. There is only the best model that you were able to discover.

13.1.2 What is the Purpose of Using Train/Test Sets?

Creating a train and test split of your dataset is one method to quickly evaluate the performance of an algorithm on your problem. The training dataset is used to prepare a model, to train it. We pretend the test dataset is new data where the output values are withheld from the algorithm. We gather predictions from the trained model on the inputs from the test dataset and compare them to the withheld output values of the test set.

Comparing the predictions to the withheld outputs in the test dataset allows us to compute a performance measure for the model on the test dataset. This is an estimate of the skill of the algorithm trained on the problem when making predictions on unseen data. Using k-fold cross-validation is a more robust and more computationally expensive way of calculating this same estimate. We use the estimate of the skill of our LSTM model on a training dataset as a proxy for estimating what the skill of the model will be in practice when making predictions on new data.

This is quite a leap and requires that:

  • The procedure that you use is suciently robust that the estimate of skill is close to what we actually expect on unseen data.
  • The choice of performance measure accurately captures what we are interested in measuring in predictions on unseen data.
  • The choice of data preparation is well understood and repeatable on new data, and reversible if predictions need to be returned to their original scale or related to the original input values.
  • The choice of model architecture and conguration makes sense for its intended use and operational environment (e.g. complexity).

A lot rides on the estimated skill of the whole procedure on the test set or the k-fold cross-validation procedure.

13.1.3 How to Finalize an LSTM Model?

You finalize a model by applying the chosen LSTM architecture and conguration on all of your data. There is no train and test split and no cross-validation folds. Put all of the data back together into one large training dataset and t your model. That's it. With the nalized model, you can:

  • save the model for later or operational use.
  • Load the model and make predictions on new data.

Why Not Keep the Best Trained Model?

It is possible that your LSTM model takes many days or weeks to prepare. In that case, you may want to keep the model t on the train dataset without tting it on the combination of train and test sets. This is a trade-o between the possible benets of training the model on the additional data and the time and computational cost for tting a new model.

Won't the Performance of the Final Model Be Dierent?

The whole idea of using a robust test harness was to estimate the skill of the nal model. Ideally, the dierence in skill between the estimate and what is observed in the nal model is minor to the point of measurement error or that the skill is lifted as a function of the number of training examples used to t the model. You can test both of these assumptions by performing a sensitivity analysis of model skill versus number of training examples.

Won't the Final Model Be Dierent Each Time it Is Trained?

The estimate of skill that you used to choose the nal model should be averaged over multiple runs. That way, you know that on average the chosen model architecture and conguration is skillful. You could try to control for the stochastic nature of the model by training multiple nal models and using an ensemble or average of their predictions in practice. Again, you can design a sensitivity analysis to test whether this will result in a more stable set of predictions.

13.2 Save LSTM Models to File

Keras provides an API to allow you to save your model to le. There are two options:

  1. Save model to a single le.
  2. Save architecture and weights to separate les.

In both cases, the HDF5 le format is used that eciently stores large arrays of numbers on disk. You will need to conrm that you have the h5py Python library installed. It can be installed as follows:

sudo pip install h5py
Listing 13.1: Install the required h5py Python library.
Clone this wiki locally