Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save a trained model for future use? #24

Open
EricKeenan opened this issue Feb 17, 2022 · 8 comments
Open

Save a trained model for future use? #24

EricKeenan opened this issue Feb 17, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@EricKeenan
Copy link

Wow! Great project - thanks for your hard work.

Is there a way to save a trained model, load it into a new notebook, and run inference? Apologies if this is documented somewhere.

e.g.

rf_model = rf_model(X_train, Y_train)
rf_model.train()
rf_model.save("rf_model.pb") # <---- Is there anything like this?

and then in a new notebook

rf_model = esem.open("rf_model.pb") # <---- Is there anything like this?
@duncanwp
Copy link
Owner

Thanks!

Currently there isn't a way to do this, no. It seems a very sensible thing to allow though. I think it would require each type of model (GPFlow, sckit-learn and keras) having save and load methods that the Emulator can then just use.

I will flag this as a feature request and try to implement it when I have a chance but would also be very happy to review pull-requests that implement it.

In the meantime, for the random forest model (only) I think you should be able to just use pickle:

import pickle
pickle.dump(rf_model, 'rf_model.pb')
rf_model2 = pickle.load('rf_model.pb')
rf_model2.predict(...)

@duncanwp duncanwp added the enhancement New feature or request label Feb 18, 2022
@EricKeenan
Copy link
Author

Thanks for the reply.

That solution doesn't appear to work in my case

pickle.dump(rf_model, 'rf_model.pkl')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_14674/1446057301.py in <module>
----> 1 pickle.dump(rf_model, 'rf_model.pkl')

TypeError: file must have a 'write' attribute
b

Likewise with

with open("rf_model.pkl","wb") as f:
    pickle.dump(rf_model, f)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_14674/851769721.py in <module>
      1 with open("rf_model.pkl","wb") as f:
----> 2     pickle.dump(rf_model, f)

TypeError: can't pickle tensorflow.python._pywrap_tfe.EagerContextThreadLocalData objects
    
  

@duncanwp
Copy link
Owner

Ah OK, that's because of some of the tensorflow functions on the Emulator... This will need a bit more thought sorry.

@EricKeenan
Copy link
Author

@duncanwp It seems like the esem random forest is a simple implementation of the scikit-learn random forest. In the meantime, would you recommend training my emulator with scikit-learn and saving the model using pickle?

@duncanwp
Copy link
Owner

duncanwp commented Feb 23, 2022

Yes, you're absolutely right.

It's also possible (but a little convoluted currently) to wrap the loaded model back in to ESEm:

# Save the sklearn model held internally in the esem wrapper
with open("rf_model.pkl","wb") as f:
    pickle.dump(esem_rf_model.model.model, f)

# Load it again
with open("rf_model.pkl","rb") as f:
    skmodel=pickle.load(f)

# Wrap the loaded model

from esem.wrappers import wrap_data
from esem.data_processors import Flatten
from esem.model_adaptor import SKLearnModel
from esem.emulator import Emulator
from sklearn.ensemble import RandomForestRegressor

wrapped_skmodel = SKLearnModel(skmodel)

# Note that we need to reload the data seperately (this is used internally for post-processing)
data = wrap_data(y_train, data_processors=[Flatten()])
loaded_esem_rf_model = Emulator(wrapped_skmodel, x_train, data)

loaded_esem_rf_model.predict(...)

I made a full example here: https://gist.github.com/duncanwp/e4b96690da5bb0bf2505bb94d5450001

@EricKeenan
Copy link
Author

Thanks @duncanwp ! I managed to save a RF model. I'll leave this issue open in case you want this as documentation for a feature request. Otherwise, feel free to close. Thanks again!

@adrifoster
Copy link

Hi there @duncanwp thanks for that work around. Just bumping this thread that this would be great to have! I was able to use your workaround for a GPFlow model, but it would be great to have this as method of an Emulator object too!

@duncanwp
Copy link
Owner

duncanwp commented Aug 6, 2024

Hi @adrifoster , part of the reason this stalled was because it didn't seem to work for tensor flow based emulators - did you find it worked OK for the GPFLow one though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants