Skip to content

Commit

Permalink
Fix celery healthcheck memory leak
Browse files Browse the repository at this point in the history
There was a strange bug where over a few months, celery's
idle CPU usage kept increasing. It seems this may have been related
to the healthcheck being killed by docker, without cleaning up after
itself. This lead to hundreds of thousands of 'celery.pidbox' keys
being left behind on redis, which slowed down redis.

See celery/celery#6089
  • Loading branch information
DeD1rk committed Mar 6, 2024
1 parent 2432642 commit 3aaf851
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 11 deletions.
4 changes: 3 additions & 1 deletion infra/concrexit/worker-entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ done

exec runuser -u appuser -- celery --app thaliawebsite worker \
--loglevel INFO \
--concurrency 4 \
--concurrency 2 \
--without-gossip \
--heartbeat-interval 10 \
--beat \
--schedule /volumes/worker/celery-beat-schedule
22 changes: 12 additions & 10 deletions infra/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ services:
restart: always
healthcheck:
test: [ "CMD", "pg_isready", "--username", "concrexit" ]
interval: 2s
timeout: 2s
retries: 30
interval: 10s
timeout: 5s
retries: 5

redis:
image: redis:7.2-alpine
Expand All @@ -53,9 +53,9 @@ services:
restart: always
healthcheck:
test: [ "CMD-SHELL", "redis-cli ping | grep PONG" ]
interval: 2s
timeout: 2s
retries: 10
interval: 10s
timeout: 5s
retries: 5

worker:
image: ghcr.io/svthalia/concrexit:${TAG:-development}
Expand All @@ -75,10 +75,12 @@ services:
concrexit:
condition: service_started
healthcheck:
test: [ "CMD", "celery", "--app", "thaliawebsite", "status" ]
interval: 10s
timeout: 3s
retries: 10
# The celery status timeout needs to be smaller than docker's timeout.
# Otherwise, celery will fail to clean up after itself on redis.
test: [ "CMD", "celery", "--app", "thaliawebsite", "status", "--timeout=5" ]
interval: 30s
timeout: 10s
retries: 5

volumes:
concrexit-media:
Expand Down

0 comments on commit 3aaf851

Please sign in to comment.