Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save and load kernel in GPy in a sparse gaussian process regression #535

Open
nbeuchat opened this issue Aug 8, 2017 · 4 comments
Open

Comments

@nbeuchat
Copy link

nbeuchat commented Aug 8, 2017

Hi!

I hope this is the right place for this question. I have built and optimized a Sparse Gaussian Process Regression model using the GPy library. The documentation recommends to save the model as follow:

To save a model it is best to save the m.param_array of it to disk (using numpy’s np.save). Additionally, you save the script, which creates the model.

I am able to save the parameters of the model and recreate the model from them. However, I need to know in advance the kernel architecture that was used to build the model (defined in the function create_kernel below). To create and save the model, I do the following:

def create_kernel():
    # This function could change
    return GPy.kern.RBF(4,ARD=True) + GPy.kern.White(4)

gp = GPy.models.SparseGPRegression(X, y, 
                                   num_inducing=100,
                                   kernel=create_kernel())

# optimization steps
# ...

# Save the model parameters
np.save('gp_params.npy',gp.param_array)
np.save('gp_y.npy',y)
np.save('gp_X.npy',X_gpr)

To load the model, I am doing the following at the moment. The problem is that I might not have access to the create_kernel function.

# Load model
y_load = np.load('gp_y.npy')
X_load = np.load('gp_X.npy')
gp_load = GPy.models.SparseGPRegression(X_load, y_load, 
                                   initialize=False,
                                   num_inducing=100,
                                   kernel=create_kernel()) # Kernel is problematic here

gp_load.update_model(False)
gp_load.initialize_parameter()
gp_load[:] = np.load('gp_params.npy')
gp_load.update_model(True)

What is the best way to store the kernel for later use? The parameters of the kernel and the inducing inputs are stored in the gp_params.npy file but not the structure of the kernel. At the moment, I have to know which function was used to create the model which will not always be the case.

Thanks a lot for your help!
Nicolas

@mzwiessele
Copy link
Member

I believe this is being addressed by the new serialization framework mentioned in #547 - still in progress. It is the to_dict and from_dict functions. @zhenwendai is there more to this for now?

@blurLake
Copy link

Can this be done for other models, like GPRegression, in the similar manner?

@Amir-Arsalan
Copy link

@nbeuchat Were you able to eventually save your model/kernel using any of the new methods such as save_model() or the old-school method you showed here? It seems pretty straight-forward to use the new methods but I have issues using as shown here. I also tried the old-school method of saving the parameters as numpy arrays but get pickling errors but looking at the issues from about 2 years ago this seems not to be the case in the past. Would appreciate if you can help me figure out where I'm making a mistake.

@mzwiessele Would appreciate if you can take a look at my issue and give me a clue on what I might be doing wrong. Thanks!

@nbeuchat
Copy link
Author

@Amir-Arsalan I haven't used the new method at all as I haven't used the framework for a while now. However, back in August 2017, I could easily save the parameters as I've shown and I ended up creating a small module for that specific model containing just the create_kernel. Not ideal but that worked for our use-case back then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants