Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Delete all versions of a deleted model #6193

Draft
wants to merge 1 commit into
base: v2
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions scheduler/pkg/agent/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,21 @@ func (c *Client) LoadModel(request *agent.ModelOperationMessage, timestamp int64
logger.Errorf("there was a problem getting the config for model: %s", modelName)
}

// we are going to clean-up all the older stale version of model
for _, v := range c.stateManager.modelVersions.getVersionsForAllModels() {
// skip if version is same as current one
if v.GetVersion() == modelVersion {
continue
}
c.logger.WithField("func", c.stateManager.modelVersions.getVersionsForAllModels()).Infof("----")

err := c.ModelRepository.RemoveModelVersion(util.GetVersionedModelName(modelName, v.GetVersion()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flow is probably not the correct flow as we are not yet sure that we should be removing the old version of this model.

In progressive rollout we have to make sure that the new version of the model is up before removing an old version otherwise we risk inference requests not being served in this transition.

I dont think this is the correct place also to do the clean up as one agent (i.e. a server replica loading one model replica) doesnt have visibility with regards to the actual state of the new version of the model (e.g. that the envoy routes have been updated). Therefore I think that the clean up process should be done somewhere at the scheduler side and not on the agent.

if err != nil {
c.sendModelEventError(modelName, v.GetVersion(), agent.ModelEventMessage_UNLOAD_FAILED, err)
return err
}
}

// TODO: consider whether we need the actual protos being sent to `LoadModelVersion`?
modifiedModelVersionRequest := getModifiedModelVersion(
modelWithVersion,
Expand Down
Loading