Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: correctly shutdown even without replication #554

Merged

Conversation

carlomion
Copy link
Contributor

Fixes #553
start.sh: avoid killing replication process if it is not active.

GitHub CI:

  • Add test to check for correct shutdown of container
  • Use tail | grep to check for streaming replication logs
  • Remove unnecessary sleeps (the python functions used for the tests use a retry internally)

Copy link

what-the-diff bot commented Apr 16, 2024

PR Summary

  • Improved Sleep Time Control for Docker Containers
    Changes in .github/workflows/ci.yml provide enhanced management of sleep time for several Docker containers. This optimization helps improve the efficiency of our workflows.

  • Graceful Shutdown Enhancement in Replication Process
    Updates in 4.3/start.sh now allow for smoother halting of the replication process. In addition to this, Docker is now capable of receiving signals when shutdown is successful. This enhancement ensures the better reliability of our systems.

  • Improved Build Performance Through Caching
    By implementing caching for Apt and Pip in 4.4/Dockerfile, we have managed to enhance build performance. This feature cuts down on build time, ensuring faster integration and deployment process.

  • Enhanced Shutdown Process in 4.4
    The script 4.4/start.sh also saw improvements in its shutdown process. These changes parallel those in 4.3/start.sh with the aim of maintaining consistency across versions and improving the overall reliability of system shutdowns.

@leonardehrenfried
Copy link
Collaborator

This is also a good pull request. Let's merge #552 first, rebase on top of it and get this merged as well.

@carlomion carlomion force-pushed the bugfix/shutdown_no_replication branch from 17559cf to 04466a2 Compare April 17, 2024 08:47
@@ -8,9 +8,10 @@ ENV LANG=C.UTF-8

WORKDIR /app

RUN true \
RUN \
--mount=type=cache,target=/var/cache/apt \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what is happening here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a feature introduces (relatively) recently by Docker buildkit: https://docs.docker.com/build/cache/#use-the-dedicated-run-cache.
It instructs the image build process to mount that directory as a volume inside the container, only for the duration of the build. This way subsequent invocation of docker build will find the cached file locally, without leaving them inside the final image, thus reducing its final size.

@@ -57,7 +57,7 @@ jobs:
-p 8001:8080 \
--name nominatim \
nominatim &
sleep 120
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried a few times to get rid of these sleeps but always failed. I'm going to run the tests a few times to see if there is any difference now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the python script, it should automatically retry failed connection with an exponential backoff.
Hard-coding a forced sleep may prolong the test more than necessary, while the container may be already ready before the timeout. IMHO a polling HTTP call done periodically by the script sounds more snappy than the forced sleep.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still wasn't enough though. I will run it a few times and see.

Copy link
Collaborator

@leonardehrenfried leonardehrenfried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the tests 5 times and they always worked without the sleep so therefore I approve.

@philipkozeny philipkozeny merged commit b24d9e0 into mediagis:master Apr 17, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot shutdown container if replication is not active
3 participants