feat(scheduler): account for multiple instances of a model per server when scheduling #6054

driev · 2024-11-12T09:36:15Z

What this PR does / why we need it:

A model may be loaded into memory multiple times depending on server, or model's configuration. The scheduler and agent do not take this into consideration when scheduling and loading a model respectively, which can lead to models being scheduled onto servers that do not have sufficient memory. This aims to acquire the information that's necessary to make more accurate decisions when scheduling.

Which issue(s) this PR fixes:

Fixes # INFRA-1146

scheduler/pkg/agent/repository/mlserver/mlserver.go

scheduler/pkg/agent/repository/triton/triton_test.go

apis/mlops/agent/agent.proto

scheduler/pkg/agent/model_state.go

scheduler/pkg/agent/repository/triton/triton.go

lc525

The general layout of this change is solid, and I think merging this would reduce some surprises in terms of scheduling.

I've suggested some changes to the ModelConfig given that this is a preliminary change, before we start taking into account things like GPU memory, GPU+CPU memory, etc and make our CRDs more expressive. Please let me know what you think.

apis/mlops/agent/agent.proto

lc525

lgtm, some (hopefully minor) suggestions

commit 373df43 Author: Lucian Carata <[email protected]> Date: Thu Dec 12 01:09:30 2024 +0000 feat(k6): add scenario with multiple stages ramping up/down RPS (SeldonIO#6031) The added load test scenario allows one to configure an arbitrary number of stages, with each consisting of a linear ramp-up/down to the desired requests per second and a hold/plateau time. Within each stage, the duration for which the inference RPS is held constant is configured via one element in the `CONSTANT_RATE_DURATIONS_SECONDS` environment variable (a vector of comma separated values), with the ramp-up/ down duration preceding it being 1/3rd of the hold time. commit 34cf313 Author: paulb-seldon <[email protected]> Date: Wed Dec 11 16:59:20 2024 +0000 fix(docs): Docs on upgrading from 2.7 - 2.8 (SeldonIO#6143) * Docs on upgrading from 2.7 - 2.8 * Wording update commit 1c40f62 Author: Sherif Akoush <[email protected]> Date: Wed Dec 11 14:32:40 2024 +0000 fix: Add timeout to contexts in client calls (SeldonIO#6125) * add timeout context from infer call for modelgateway * add timeout context to pipeline gateway * set timeout context on process request * add a test for grpc call timeout * add agent k8s api call timeout * add context timeout for shutting down services * add timeout for controller k8s api calls * add timeout for control plane context * add timeout context to reconcile logic * pr comments commit 74032a4 Author: paulb-seldon <[email protected]> Date: Tue Dec 10 17:17:14 2024 +0000 Format spaces in install docs (SeldonIO#6140) commit 7e6c8f1 Author: Sherif Akoush <[email protected]> Date: Tue Dec 10 16:32:37 2024 +0000 fix(docs): add a table for core 2 dependencies in docs (SeldonIO#6139) * add table for core 2 deps in dosc * review comments commit c1d320e Author: Niall D <[email protected]> Date: Tue Dec 10 16:16:55 2024 +0000 feat(scheduler): account for multiple instances of a model per server when scheduling (SeldonIO#6054) * just checking in whatever I have * testing all the code * remove comment * linting * document unused param * changing the proto around * use parallelWorkers instead of instanceCount for mlserver * comma * rename ModelConfig * use modelWithVersion as param commit a7bfb00 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Dec 9 21:35:13 2024 +0000 Bump grafana/grafana from 11.3.1 to 11.4.0 in /scheduler (SeldonIO#6133) Bumps grafana/grafana from 11.3.1 to 11.4.0. --- updated-dependencies: - dependency-name: grafana/grafana dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit f129bd1 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Dec 9 21:33:47 2024 +0000 Bump envoyproxy/envoy from v1.32.1 to v1.32.2 in /scheduler (SeldonIO#6134) Bumps envoyproxy/envoy from v1.32.1 to v1.32.2. --- updated-dependencies: - dependency-name: envoyproxy/envoy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 208791b Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Dec 9 21:31:49 2024 +0000 Bump google.golang.org/grpc from 1.68.0 to 1.68.1 in /hodometer (SeldonIO#6136) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.68.0 to 1.68.1. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.68.0...v1.68.1) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 2abeb80 Author: Rajakavitha Kodhandapani <[email protected]> Date: Mon Dec 9 18:31:14 2024 +0530 fix(docs): first draft of the securing endpoints (SeldonIO#5991) * first draft of the securing endpoints * added the output * updated the policy name * added a note * Added context, minor grammar edits * Update docs-gb/models/securing-endpoints.md Co-authored-by: Rajakavitha Kodhandapani <[email protected]> * incorporate review suggestions * fixing the links * added an example for all models * removed the example to create a vs for all models * fixed formatting * formatting changes * Update securing-endpoints.md * added a link to the services meshes main docs page --------- Co-authored-by: Rakavitha Kodhandapani <[email protected]> Co-authored-by: Paul Bridi <[email protected]> Co-authored-by: paulb-seldon <[email protected]> commit 4125273 Author: Niall D <[email protected]> Date: Fri Dec 6 13:52:35 2024 +0000 refactor(envoy): moving envoy/resources headers to util (SeldonIO#6129) * moving headers to util * removing a newline * lint commit f284b4a Author: Sherif Akoush <[email protected]> Date: Fri Dec 6 09:45:15 2024 +0000 fix(cli): Kafka inspect output formatting (SeldonIO#6130) * add kafka inspect consumer timeout (-d) as parameter * add formatting commit 6d89d57 Author: Lucian Carata <[email protected]> Date: Fri Dec 6 01:51:54 2024 +0000 feat(docs): improve HPA documentation (SeldonIO#6091) * highlight constraints and limitations of a HPA-based approach * remove note on statefulsets being created sequentially - we are specifically configuring k8s to allow for parallel creation of statefulset pods. * highlight importance of the `metrics-relist-interval` setting * simplify config example to no longer use regex metric matches * clarify example using HPA label selectors * clarify the need to use the `AverageValue` target type * clarify the relation between query rate window size and prometheus scrape interval Merge branch 'v2' into INFRA-1420/add-clusters-before-updating-routes-part-2

just checking in whatever I have

11bdcbf

driev requested review from sakoush and lc525 as code owners November 12, 2024 09:36

driev changed the title ~~just checking in whatever I have~~ feat(scheduler): account for multiple instances of a model per server when shceduling Nov 12, 2024

driev marked this pull request as draft November 12, 2024 09:37

driev added 3 commits November 12, 2024 14:46

testing all the code

d520b92

remove comment

446fe9b

linting

b0b774a

driev marked this pull request as ready for review November 12, 2024 15:05

driev changed the title ~~feat(scheduler): account for multiple instances of a model per server when shceduling~~ feat(scheduler): account for multiple instances of a model per server when scheduling Nov 19, 2024

driev added the v2 label Nov 22, 2024