Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: directly signal session action processes using CAP_KILL #196

Draft
wants to merge 6 commits into
base: mainline
Choose a base branch
from

Conversation

jusiskin
Copy link
Contributor

What was the problem/requirement? (What/Why)

Background:

The Session.cancel_action() method needs to send signals to the session action subprocess. On POSIX systems, processes can only send signals to processes owned by the same user unless the process is root, or on Linux if the process has the CAP_KILL Linux capability (see capabilities(7) man page).

To overcome this restriction when the Session was provided with a different user than the host process, the Session.cancel_action() method would launch a subprocess using sudo to send the signal to the target process.

Problem:

Under low memory conditions, there may not be sufficient memory to launch the subprocesses in order to send the signals and cancel the session action.

What was the solution? (How)

  1. When sending the SIGTERM signal, a simplification was made to always send a signal to the subprocess since sudo will forward signals that it can intercept, including SIGTERM, to its child process.

  2. When the CAP_KILL Linux capability is present, a process can send OS signals to processes belonging to other users. The code was modified to detect when CAP_KILL is in the process' permitted/effective capability set(s) and, if so, send signals directly to the process.

    The implementation falls back to using the sudo and bash approach for signalling cross-user if the direct signaling is not possible or fails.

    If CAP_KILL is in the permitted capability set, but not the effective capability set, then openjd will use libcap via ctypes to temporarily promote CAP_KILL from permitted → effective and undo this after signalling.

Along the way, a simplification of the process tree was made. Before, openjd-sessions would leave an intermediate bash process. This bash script had code to handle signal forwarding. This implementation turned out to be unnecessary. To simplify the process tree, the bash script was modified to use the exec bash built-in which replaces the process with the session action command. This means one less process in the process tree and we do not need signal forwarding. The process tree change is illustrated in the diagram below:

flowchart TD
    sa([Session action])
    style bash fill:red,stroke-dasharray: 5 5
    openjd --> sudo
    sudo --> bash
    bash --> sa
Loading

What is the impact of this change?

  1. On Linux, when the process has the CAP_KILL capability, the Session.cancel_action() method is more robust under low-memory conditions.
  2. On POSIX, the process tree created by openjd-sessions is simplified and there is no longer an intermediate bash process kept around and forwarding OS signals

How was this change tested?

See DEVELOPMENT.md for information on running tests.

  • The integration tests and cross-user Linux test containers were augmented with tests for direct signalling and setup two conditions:
    1. CAP_KILL is in the permitted and effective capability set
    2. CAP_KILL is in the permitted capability set only
      • Asserts that CAP_KILL is temporarily added to the effective capability set and restored after signalling
  • The integration tests were augmented to test when CAP_KILL is not present and that the original implementation is used as a fallback

Was this change documented?

  • Are relevant docstrings in the code base updated?
    Yes

Is this a breaking change?

A breaking change is one that modifies a public contract in a way that is not backwards compatible. See the
Public Interfaces section
of the DEVELOPMENT.md for more information on the public contracts.

No, the public APIs and functional behaviours are unchanged.

Does this change impact security?

  • Does the change need to be threat modeled? For example, does it create or modify files/directories that must only be readable by the process owner?
    • If so, then please label this pull request with the "security" label. We'll work with you to analyze the threats.

Yes, the change temporarily elevates privileges in Session.cancel_action() for sending cross-user OS signals.

openjd-sessions is now "capability aware". If CAP_KILL is in the process thread's permitted capability set but not in its effective capability set, it will:

  1. temporarily add CAP_KILL to the thread's effective capability set
  2. send the cross-user OS signal
  3. remove CAP_KILL from the thread's effective capability set

A integration test has been added to assert that CAP_KILL is removed from the effective capability set.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jusiskin jusiskin added the security Pull requests that could impact security label Nov 15, 2024
@jusiskin jusiskin force-pushed the linux_direct_signal_with_cap_kill branch 6 times, most recently from f4ba9ba to a1eeb1d Compare November 15, 2024 06:29
@jusiskin jusiskin force-pushed the linux_direct_signal_with_cap_kill branch from a1eeb1d to 621232c Compare November 15, 2024 06:33
Copy link

sonarcloud bot commented Nov 15, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots

See analysis details on SonarQube Cloud

# https://man7.org/linux/man-pages/man3/cap_set_proc.3.html
libcap.cap_set_proc.restype = ctypes.c_int
libcap.cap_set_proc.argtypes = [
ctypes.POINTER(Cap),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You have this defined above as cap_t

break

# If we did not find any child processes yet, sleep for some time and retry
time.sleep(min(0.05, now - start))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now - start ? Do you mean timeout_seconds - (now-start) ? i.e. Sleep for the lesser of 0.05s or the time remaining before timeout.

As written, this'll sleep for 0s the first time and then 0.05s every time after that.

if not sys.platform.startswith("linux"):
raise OSError(f"libcap is only available on Linux, but found platform: {sys.platform}")

libcap = ctypes.CDLL(find_library("cap"), use_errno=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if libcap is not installed/available? I notice that you had to install the lib into the docker container, for instance. If we get an exception, then that'll probably end up unhandled in the LoggingSubprocess?

Related: We should also update the README to let folk know that having this library available is beneficial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request security Pull requests that could impact security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants