Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

Improve drain script support #10

Merged
merged 19 commits into from
May 12, 2021
Merged

Improve drain script support #10

merged 19 commits into from
May 12, 2021

Conversation

jandubois
Copy link
Member

@jandubois jandubois commented May 10, 2021

This PR tries to address the remaining issues in the Quarks operator regarding drain script support.

Many thanks to @univ0298 for testing the changes and @andrew-edgar for updating the tests!

History

The first identified issue was: POD draining does not work because individual containers terminate prematurely instead of waiting for the completion of all drain/preStops on all containers.

It was supposed to be fixed by: Wait for all drain scripts to finish

That fix was incomplete because containers whose main process exits will get killed immediately and not participate in the drain script waiting process. The first idea how to fix this did not work out: Make sure terminated containers are considered as already drained

Further problems are identified in: Many Cloud Foundry drain scripts do not work in KubeCF due to no monit and no bpm

Fixes

This pull request attempts to address the remaining issues with:

  • keep container-run running until it receives a SIGTERM signal. That way PreStop hooks should not get killed prematurely.

  • provide an emultation bpm script that implements the common bpm stop command used by many of the drain scripts in the CF releases. In addition it provides a mechanism to implement patches for those drain scripts that directly use the pid files to signal other processes.

In addition this PR fixes the way subprocesses are stopped via SIGKILL, orphaning their child processes and generally not allowing things to shut down gracefully. This PR mirrors the bpm stop functionality and sends a SIGTERM, followed 20s later with a SIGKILL if the child processes are still running at that time.

BPM commands

The bpm script implements the following commands from the bpm-release:

bpm start job [-p process]
bpm stop job [-p process ]

The bpm stop command will wait up to 30s for the process to actually stop (it normally should be killed via SIGKILL within 20s if it doesn't respong to SIGTERM).

The following extensions are also provided:

bpm term job [-p process]
bpm quit job [-p process]
bpm running job [-p process]
  • bpm term is the same as bpm stop, except it doesn't wait for the process to terminate.

  • bpm quit will send a SIGQUIT signal to the process (e.g. to tell nginx to shut down gracefully).

  • bpm running will return with a 0 exit code if the process is still running, or a 1 if it isn't. It will also print "yes" or "no" to stdout, if that is connected to a tty.

Manual testing

I compiled the program on Linux (using nc with UDP to Unix sockets doesn't seem to work on macOS):

go build -o container-run cmd/main.go

I use the following test program for all tests:

# cat main.sh 
#!/usr/bin/env bash

SIGQUIT=""
function quit {
    echo SIGQUIT
    SIGQUIT=1
}
trap quit SIGQUIT

function term {
    echo SIGTERM
    if [ -n "$SIGQUIT" ]; then
        echo "Exiting because already received SIGQUIT"
        exit 0
    fi
}
trap term SIGTERM

echo Start
for i in $(seq 10); do
    echo Loop $i
    sleep 3
done
echo End
exit 0

It runs for 30s and then exits on its own, printing a progress message every 3 seconds. It acknowledges SIGQUIT and SIGTERM signals, but only exits early if the SIGTERM has been preceeded by a SIGQUIT. It is always run via:

# ./container-run --job-name main --process-name main $PWD/main.sh; echo $?
Start
Loop 1
[...]
Loop 10
End

The main.sh process runs until completion, but container-run remains running. From a second shell main.sh can be restarted:

# alias bpm=/var/vcap/jobs/bpm/bin/bpm
# bpm start main

and the process runs again from start to finish. Running pkill container-run once the script has finished will terminate container-run immediately now.

Starting container-run again and running bpm stop main from the second terminal shows that the signal is received (once the current sleep returns), but the process doesn't abort right away; it runs for another 20s before being killed:

# time bpm stop main

real    0m20.040s

Output:

# ./container-run --job-name main --process-name main $PWD/main.sh; echo $?
Start
Loop 1
Loop 2
SIGTERM
Loop 3
Loop 4
Loop 5
Loop 6
Loop 7
Loop 8

Using bpm term main works the same as bpm stop main, but returns immediately and doesn't wait for the process to stop.

Sending a SIGQUIT before the stop command lets the script terminate right away, but container-run will keep running as expected:

# bpm start main; sleep 3; bpm quit main; bpm stop main

Output:

[...]
Loop 8
Start
Loop 1
Loop 2
SIGQUIT
SIGTERM
Exiting because already received SIGQUIT

Sending SIGTERM to container-run while main.sh is running will send SIGTERM to the script as well. It will continue running because it hasn't received a SIGQUIT first. After 20s the script will be killed via SIGKILL and container-run exits:

# bpm start main; sleep 3; pkill container-run

Output:

[...]
Exiting because already received SIGQUIT
Start
Loop 1
Loop 2
SIGTERM
Loop 3
Loop 4
Loop 5
Loop 6
Loop 7
Loop 8
0
#

The bpm running command can be used to check if the process is still running or not:

# bpm start main; bpm term main; while bpm running main; do sleep 5; done
yes
yes
yes
yes
no

Open issues

In addition to a new operator release using the updated container-run version we also need at least 2 patches at the kubecf level:

  • The cloud controller drain.sh script calls shutdown_drain.rb using cloud_controller_ng/drain.rb. It should be possible to replace the drain script with something like:

    bpm quit nginx
    for i in $(seq 30); do
        test bpm running nginx && sleep 1
    done
    bpm term nginx
    bpm term cloud_controller_ng
  • The garden process invokes garden_start via tini. Sine we don't run tini using pid 1, we need to pass the -s option to enable the TINI_SUBREAPER.

jandubois and others added 16 commits May 6, 2021 18:06
…ss fails

To avoid terminating early and killing a still running drain script it is
necessary to keep the container-run process running even if the main
process has already exited.

Once container-run receives the SIGTERM signal it will pass it on to the
child processes and then wait until all direct children have exited before
exiting itself. This should allow the processes to terminate cleanly as
long as the grace period has not yet expired.

container-run will also terminate immediately if the main process terminates
with an error. This allows the container to fail and be restarted by k8s.

Signed-off-by: Jan Dubois <[email protected]>
This is used to trigger graceful shutdown of nginx.

Signed-off-by: Jan Dubois <[email protected]>
…ning

Can be used to wait for a process to stop after sending the STOP command.

Signed-off-by: Jan Dubois <[email protected]>
It only supports the "start" and "stop" commands from the real bpm,
but has additional "quit" and "running" commands to help reimplementing
drain scripts that make use of the pid files.

Signed-off-by: Jan Dubois <[email protected]>
Signed-off-by: Jan Dubois <[email protected]>
It is important to process errors synchronously because their handling
depends on the value of the 'active' flag. To avoid deadlocks any code
sending errors to the channel must run in a blockable goroutine.

Signed-off-by: Jan Dubois <[email protected]>
(and give them a chance to notify their child processes). If
the processes haven't terminated in 20s, send SIGKILL.

Signed-off-by: Jan Dubois <[email protected]>
That way it also starts the timeout trigger if the process
doesn't stop within 20s.

Signed-off-by: Jan Dubois <[email protected]>
The rest are just confusing the debug output, and passing
on e.g. SIGCHLD to the child processes is neither correct
nor useful.

Signed-off-by: Jan Dubois <[email protected]>
container-run should send a SIGKILL if the process is still alive
after 20s, so if it isn't gone after 30, then it probably will not
stop.

The `bpm running` subcommand will also now print "yes" or "no" in
addition to setting the exit code.

Signed-off-by: Jan Dubois <[email protected]>
(but they no longer pass because they have not been updated
with the new application logic).

Signed-off-by: Jan Dubois <[email protected]>
Also modify `bpm running` to only print yes/no when stdout is a tty.

Signed-off-by: Jan Dubois <[email protected]>
@jandubois jandubois requested a review from mook-as May 10, 2021 22:41
@jandubois
Copy link
Member Author

  • we need to pass the -s option to enable the TINI_SUBREAPER

cloudfoundry-incubator/kubecf#1729

@jandubois
Copy link
Member Author

This new nginx drain logic was added in capi-release v1.110.0 via cloudfoundry/capi-release@30690f3.

kubecf is currently using capi-release v1.107.0, so this does not yet apply.

Copy link
Contributor

@mook-as mook-as left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure my review is comprehensive, but it's something at least I guess?

pkg/containerrun/bpm.go Show resolved Hide resolved
pkg/containerrun/bpm.go Show resolved Hide resolved
pkg/containerrun/containerrun.go Outdated Show resolved Hide resolved
pkg/containerrun/containerrun.go Show resolved Hide resolved
pkg/containerrun/containerrun.go Outdated Show resolved Hide resolved
pkg/containerrun/containerrun.go Outdated Show resolved Hide resolved
pkg/containerrun/containerrun.go Show resolved Hide resolved
pkg/containerrun/containerrun.go Outdated Show resolved Hide resolved
Signed-off-by: Jan Dubois <[email protected]>
@jandubois jandubois requested a review from mook-as May 11, 2021 16:31
The errors channel is unbuffered, so stopProcesses cannot deliver
errors if it is running from the same goroutine as the receiver.

Signed-off-by: Jan Dubois <[email protected]>
Signed-off-by: Jan Dubois <[email protected]>
@manno
Copy link
Member

manno commented May 12, 2021

That looks great.

One thing I can think of, if any of the goroutines in the tests gives a problem (flakieness, etc) maybe defer GinkgoRecover() helps.

@manno manno merged commit 76ca394 into master May 12, 2021
@manno manno deleted the drain-support branch May 12, 2021 07:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants