Pre-trained model before incremental training #478

occoder · 2021-02-20T07:22:07Z

occoder
Feb 20, 2021

First of all, thanks for the River team turning such a brilliant idea into a slick package.
Just quickly went through the docs and already started imagining how to apply it to my day to day ML work.

Batch training usually ends up with a model that naturally becomes the starting point that following prediction is based on.
In contrast, River's incremental training style seems to start from scratch and form the model along the way. The model becomes robust as data stream flows in.
But in some cases, the starting point needs a relatively robust model to begin with. One possible approach might be to take a pre-trained or last trained model as its the starting point before incremental training.
I'd like to know if this is an already addressed pattern within the River package scope?
If yes, would you enlighten me more on this? Thanks.

Answered by raphaelsty

Feb 20, 2021

Glad to see that you share our philosophy. 😊

But in some cases, the starting point needs a relatively robust model to begin with. One possible approach might be to take a pre-trained or last trained model as its the starting point before incremental training.

Of course, it may be appropriate to "warm up" a model before it is deployed. In practice, all you have to do is create your model, train it on the data of your choice and save the model with the pickle library for example. You can then load your model in the production environment and update it in streaming.

Here's an example that I picked up from the doc. This example aim to predict bike availability. I can warm-up my model using pa…

View full answer

raphaelsty · 2021-02-20T16:48:49Z

raphaelsty
Feb 20, 2021
Maintainer

Glad to see that you share our philosophy. 😊

But in some cases, the starting point needs a relatively robust model to begin with. One possible approach might be to take a pre-trained or last trained model as its the starting point before incremental training.

Of course, it may be appropriate to "warm up" a model before it is deployed. In practice, all you have to do is create your model, train it on the data of your choice and save the model with the pickle library for example. You can then load your model in the production environment and update it in streaming.

Here's an example that I picked up from the doc. This example aim to predict bike availability. I can warm-up my model using past-data. Serialize it using pickle and then load it in my production pipeline.

from river import datasets
from river import compose
from river import linear_model
from river import metrics
from river import evaluate
from river import preprocessing
from river import optim

X_y = datasets.Bikes()

model = compose.Select('clouds', 'humidity', 'pressure', 'temperature', 'wind')
model |= preprocessing.StandardScaler()
model |= linear_model.LinearRegression(optimizer=optim.SGD(0.001))

metric = metrics.MAE()

evaluate.progressive_val_score(X_y, model, metric, print_every=20_000)

Serializing the pre-trained model as a pickle file:

import pickle

with open('model.pickle', 'wb') as file:
    
    pickle.dump(model, file)

Loading the pre-trained model:

import pickle

with open('model.pickle', 'rb') as file:
    
    pre_trained_model = pickle.load(file)

In the future, we plan to add a mini-batch mode to the algorithms to speed up this pre-training phase. Work in Progress.

Raphaël

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-trained model before incremental training #478

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Pre-trained model before incremental training #478

occoder Feb 20, 2021

Replies: 1 comment

raphaelsty Feb 20, 2021 Maintainer

occoder
Feb 20, 2021

raphaelsty
Feb 20, 2021
Maintainer