Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] can not set submitter pod restart times #2393

Open
1 of 2 tasks
xiaoyu1095 opened this issue Sep 20, 2024 · 2 comments
Open
1 of 2 tasks

[Bug] can not set submitter pod restart times #2393

xiaoyu1095 opened this issue Sep 20, 2024 · 2 comments
Labels
bug Something isn't working triage

Comments

@xiaoyu1095
Copy link

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

In my scenario, if the execution fails, it should be considered a failure and no retries are needed. However, I cannot disable retries entirely.

Reproduction script

submitterPodTemplate:
spec:
restartPolicy: Never
containers:
- image: harbor.thupx.cn/docker.io/rayproject/ray:2.34.0
imagePullPolicy: IfNotPresent
name: ray-job-submitter

image

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@xiaoyu1095 xiaoyu1095 added bug Something isn't working triage labels Sep 20, 2024
@xiaoyu1095
Copy link
Author

xiaoyu1095 commented Sep 20, 2024

apiVersion: ray.io/v1
kind: RayJob
metadata:
name: test
spec:
entrypoint: python main.py
jobId: "123"
submitterPodTemplate:
spec:
restartPolicy: Never
containers:
- image: rayproject/ray:2.34.0
imagePullPolicy: IfNotPresent
name: ray-job-submitter
shutdownAfterJobFinishes: true
rayClusterSpec:
headGroupSpec:
rayStartParams:
dashboard-host: 0.0.0.0
template:
spec:
containers:
- image: rayproject/ray:2.34.0
imagePullPolicy: IfNotPresent
name: ray-master
ports:
- containerPort: 6379
name: gcs-server
protocol: TCP
- containerPort: 8265
name: dashboard
protocol: TCP
- containerPort: 10001
name: client
protocol: TCP
- containerPort: 8000
name: serve
protocol: TCP
volumeMounts:
- mountPath: /tmp/ray/
name: log-volume
rayVersion: 2.34.0
workerGroupSpecs:
- groupName: small-group
maxReplicas: 1
minReplicas: 1
replicas: 1
rayStartParams: {}
template:
spec:
containers:
- image: rayproject/ray:2.34.0
name: ray-worker
resources:
limits:
cpu: 500m
memory: 8Gi
nvidia.com/gpu: 0
requests:
cpu: 500m
memory: 8Gi
nvidia.com/gpu: 0

@andrewsykim
Copy link
Collaborator

Try setting submitterConfig.backoffLimit = 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants