-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(scheduler): account for multiple instances of a model per server when scheduling #6054
feat(scheduler): account for multiple instances of a model per server when scheduling #6054
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general layout of this change is solid, and I think merging this would reduce some surprises in terms of scheduling.
I've suggested some changes to the ModelConfig given that this is a preliminary change, before we start taking into account things like GPU memory, GPU+CPU memory, etc and make our CRDs more expressive. Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, some (hopefully minor) suggestions
commit 373df43 Author: Lucian Carata <[email protected]> Date: Thu Dec 12 01:09:30 2024 +0000 feat(k6): add scenario with multiple stages ramping up/down RPS (SeldonIO#6031) The added load test scenario allows one to configure an arbitrary number of stages, with each consisting of a linear ramp-up/down to the desired requests per second and a hold/plateau time. Within each stage, the duration for which the inference RPS is held constant is configured via one element in the `CONSTANT_RATE_DURATIONS_SECONDS` environment variable (a vector of comma separated values), with the ramp-up/ down duration preceding it being 1/3rd of the hold time. commit 34cf313 Author: paulb-seldon <[email protected]> Date: Wed Dec 11 16:59:20 2024 +0000 fix(docs): Docs on upgrading from 2.7 - 2.8 (SeldonIO#6143) * Docs on upgrading from 2.7 - 2.8 * Wording update commit 1c40f62 Author: Sherif Akoush <[email protected]> Date: Wed Dec 11 14:32:40 2024 +0000 fix: Add timeout to contexts in client calls (SeldonIO#6125) * add timeout context from infer call for modelgateway * add timeout context to pipeline gateway * set timeout context on process request * add a test for grpc call timeout * add agent k8s api call timeout * add context timeout for shutting down services * add timeout for controller k8s api calls * add timeout for control plane context * add timeout context to reconcile logic * pr comments commit 74032a4 Author: paulb-seldon <[email protected]> Date: Tue Dec 10 17:17:14 2024 +0000 Format spaces in install docs (SeldonIO#6140) commit 7e6c8f1 Author: Sherif Akoush <[email protected]> Date: Tue Dec 10 16:32:37 2024 +0000 fix(docs): add a table for core 2 dependencies in docs (SeldonIO#6139) * add table for core 2 deps in dosc * review comments commit c1d320e Author: Niall D <[email protected]> Date: Tue Dec 10 16:16:55 2024 +0000 feat(scheduler): account for multiple instances of a model per server when scheduling (SeldonIO#6054) * just checking in whatever I have * testing all the code * remove comment * linting * document unused param * changing the proto around * use parallelWorkers instead of instanceCount for mlserver * comma * rename ModelConfig * use modelWithVersion as param commit a7bfb00 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Dec 9 21:35:13 2024 +0000 Bump grafana/grafana from 11.3.1 to 11.4.0 in /scheduler (SeldonIO#6133) Bumps grafana/grafana from 11.3.1 to 11.4.0. --- updated-dependencies: - dependency-name: grafana/grafana dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit f129bd1 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Dec 9 21:33:47 2024 +0000 Bump envoyproxy/envoy from v1.32.1 to v1.32.2 in /scheduler (SeldonIO#6134) Bumps envoyproxy/envoy from v1.32.1 to v1.32.2. --- updated-dependencies: - dependency-name: envoyproxy/envoy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 208791b Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Dec 9 21:31:49 2024 +0000 Bump google.golang.org/grpc from 1.68.0 to 1.68.1 in /hodometer (SeldonIO#6136) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.68.0 to 1.68.1. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.68.0...v1.68.1) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 2abeb80 Author: Rajakavitha Kodhandapani <[email protected]> Date: Mon Dec 9 18:31:14 2024 +0530 fix(docs): first draft of the securing endpoints (SeldonIO#5991) * first draft of the securing endpoints * added the output * updated the policy name * added a note * Added context, minor grammar edits * Update docs-gb/models/securing-endpoints.md Co-authored-by: Rajakavitha Kodhandapani <[email protected]> * incorporate review suggestions * fixing the links * added an example for all models * removed the example to create a vs for all models * fixed formatting * formatting changes * Update securing-endpoints.md * added a link to the services meshes main docs page --------- Co-authored-by: Rakavitha Kodhandapani <[email protected]> Co-authored-by: Paul Bridi <[email protected]> Co-authored-by: paulb-seldon <[email protected]> commit 4125273 Author: Niall D <[email protected]> Date: Fri Dec 6 13:52:35 2024 +0000 refactor(envoy): moving envoy/resources headers to util (SeldonIO#6129) * moving headers to util * removing a newline * lint commit f284b4a Author: Sherif Akoush <[email protected]> Date: Fri Dec 6 09:45:15 2024 +0000 fix(cli): Kafka inspect output formatting (SeldonIO#6130) * add kafka inspect consumer timeout (-d) as parameter * add formatting commit 6d89d57 Author: Lucian Carata <[email protected]> Date: Fri Dec 6 01:51:54 2024 +0000 feat(docs): improve HPA documentation (SeldonIO#6091) * highlight constraints and limitations of a HPA-based approach * remove note on statefulsets being created sequentially - we are specifically configuring k8s to allow for parallel creation of statefulset pods. * highlight importance of the `metrics-relist-interval` setting * simplify config example to no longer use regex metric matches * clarify example using HPA label selectors * clarify the need to use the `AverageValue` target type * clarify the relation between query rate window size and prometheus scrape interval Merge branch 'v2' into INFRA-1420/add-clusters-before-updating-routes-part-2
What this PR does / why we need it:
A model may be loaded into memory multiple times depending on server, or model's configuration. The scheduler and agent do not take this into consideration when scheduling and loading a model respectively, which can lead to models being scheduled onto servers that do not have sufficient memory. This aims to acquire the information that's necessary to make more accurate decisions when scheduling.
Which issue(s) this PR fixes:
Fixes # INFRA-1146