Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi Kueue] Client Connection Failures with removed integrations #3582

Open
Bobbins228 opened this issue Nov 18, 2024 · 3 comments
Open

[Multi Kueue] Client Connection Failures with removed integrations #3582

Bobbins228 opened this issue Nov 18, 2024 · 3 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Bobbins228
Copy link

Bobbins228 commented Nov 18, 2024

What happened:
I set up Multi Kueue as per the documentation.
I removed "jobset.x-k8s.io/jobset" from the list of integrations in the kueue-manager-config Config Map as I did not intend on using JobSets. This was included in the application of the Kueue manifests.

After setting up my Multi Kueue environment I ran the example test commands to which I got the following errors:

CQ - Active: False Reason: AdmissionCheckInactive Message: Can't admit new workloads: references inactive AdmissionCheck(s): [sample-multikueue].
AC - Active: False Reason: NoUsableClusters Message: Inactive clusters: [multikueue-test-worker1]
MC - Active: False Reason: ClientConnectionFailed Message: no matches for kind "JobSet" in version "jobset.x-k8s.io/v1alpha2"

What you expected to happen:
I expected that when an integration is disabled in the Kueue manager config it should be reflected in MultiKueue meaning no additional CRDs should have to be installed for Job types the user has no intention on using.

How to reproduce it (as minimally and precisely as possible):

  • Setup your Clusters for Multi Kueue. (Ensure JobSet is removed from the list of integrations in the manager Config Map)
  • Do not install the JobSet Controller/CRDs
  • Remove JobSet Cluster Roles for the multikueue-sa
  • Examine your Multi Kueue Cluster CR for the failure message.

Anything else we need to know?:
This same behaviour would happen if it was the MPI Operator that was not installed.
See this Slack thread for more info.
Environment:

  • Kubernetes version (use kubectl version): v1.28
  • OpenShift version: 4.15.37
  • Kueue version (use git describe --tags --dirty --always): v0.9.0
  • Cloud provider or hardware configuration: AWS
  • Install tools: kubectl
  • Others:
  • MPI Operator: v0.6.0
  • KubeFlow Training Operator: v1.8.1
@Bobbins228 Bobbins228 added the kind/bug Categorizes issue or PR as related to a bug. label Nov 18, 2024
@mimowo
Copy link
Contributor

mimowo commented Nov 18, 2024

/assign @mszadkow

@mimowo
Copy link
Contributor

mimowo commented Nov 18, 2024

cc @mwielgus @tenzen-y

@mimowo
Copy link
Contributor

mimowo commented Nov 18, 2024

cc @mbobrovskyi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants