Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: No nodes replied within time constraint #12

Open
rrrnld opened this issue Sep 4, 2024 · 1 comment
Open

Error: No nodes replied within time constraint #12

rrrnld opened this issue Sep 4, 2024 · 1 comment

Comments

@rrrnld
Copy link

rrrnld commented Sep 4, 2024

We're self-hosting saleor and running into issues with our celery deployment, where the worker appears to get stuck after a while. We're deploying to k8s and run celery workers like this:

celery -A saleor --app=saleor.celeryconf:app worker --loglevel=info --beat

This is taken from the config that was removed here: saleor/saleor#13777

I can see the worker processes are running. It's also what this repo uses to deploy saleor:

containers:
- name: "{{ $fullName }}-celery"
{{- if .Values.image.imageName }}
image: "{{ lower .Values.image.imageName }}"
{{- else }}
image: "{{ lower .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
{{- end }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
env:
{{- range .Values.global.env }}
- name: {{ .name }}
value: {{ .value | quote}}
{{- end }}
- name: ALLOWED_HOSTS
value: {{ .Values.global.allowedHosts }}
- name: ALLOWED_CLIENT_HOSTS
value: {{ .Values.global.allowedHosts }}
envFrom:
- secretRef:
name: {{ include "saleor-helm.fullname" . }}
args:
- celery
- --app=saleor
- --app=saleor.celeryconf:app
- worker
- --loglevel=INFO
- --beat

Is this the correct way to? I'm asking because celery -A saleor --app=saleor.celeryconf:app is redundant for example. Also, shelling into the container and trying to inspect it via celery -A saleor --app=saleor.celeryconf:app inspect active or celery -A saleor --app=saleor.celeryconf:app status both fail, and the lifetime check here in this repo does not seem to be working at all.

Error: No nodes replied within time constraint

Any idea what might be wrong with our healthchecks / lifetime checks?

@JannikZed
Copy link
Contributor

@rrrnld we honestly did not use the helm chart with the most recent Saleor versions, as we did move to the cloud deployment, but it did work before. So currently I don't have the capacity to test that again, but we will most likely try the self-hosted deployment again in the future..
we added this liveness checks to make really sure, that the workers are alive and the redis connection is still active and that used to work fine. How does the being stuck look like to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants