-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: terminate hanging jobs #187
base: main
Are you sure you want to change the base?
Conversation
Everything else looks good. |
ed005b1
to
dab3b03
Compare
set -x | ||
|
||
echo "Starting terminate_starting_and_started_runs in background to terminate orphaned ones." | ||
python /opt/dagster/app/scripts/terminate_starting_and_started_runs.py & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The background job (sub process) started here might fail for any reason, including /opt/dagster/app/scripts/terminate_starting_and_started_runs.py
missing. I'm afraid that this will cause further headaches down the road, at some inconvenient moment, in production. I strongly suggest that we change the script to crash (exit with non-0) the contain if this happens.
It seems very hard to get the following behaviour solely with a Bash script:
- If the background job fails, also quit the main process.
- If the main process exits, don't leave the background job running.
- If both run through cleanly, exit as usual.
After experimenting for too long (with many variants of 1, 2 & 3), I propose just using tini
in combination with kill "-$$"
as a workaround.
#!/bin/bash
set -eo pipefail
echo 'Running terminate_starting_and_started_runs.py as a background job.' >&2
set -x
# In case the background job fails, we kill the entire shell ($$ has its PID) and all its children (by negating the PID).
# This *does not* work if `$$` evaluates to 1 (our shell is the init process), so we *must* run this script with an "external" init command.
python /opt/dagster/app/scripts/terminate_starting_and_started_runs.py \
|| kill -TERM -- "-$$" &
exec "$@"
``
''' | ||
|
||
|
||
def get_run_ids_of_runs(status: list[str], timeout: int = 20) -> list[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's likely that we'll want to change this soon, so let's make it configurable via an environment variable.
@@ -27,6 +27,10 @@ LABEL org.opencontainers.image.licenses="(EUPL-1.2)" | |||
|
|||
EXPOSE 3000 | |||
|
|||
COPY scripts/ /opt/dagster/app/scripts/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(See the comment about background jobs below.)
COPY scripts/ /opt/dagster/app/scripts/ | |
ARG TARGETARCH | |
ENV TINI_VERSION=v0.19.0 | |
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TARGETARCH} /tini | |
RUN chmod +x /tini && /tini --version | |
COPY scripts/ /opt/dagster/app/scripts/ | |
ENTRYPOINT [ "/tini", "--", "/opt/dagster/app/scripts/start_runs_termination_script_in_background.sh" ] |
I'm afraid that this change will cause more headaches than helping us. Effectively, we have two race conditions here:
|
This PR