Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows: fix inefficient gathering of task processes #20619

Merged
merged 2 commits into from
May 17, 2024
Merged

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented May 16, 2024

This PR overhauls the way processes are scanned on Windows for the exec driver(s).

A regression was introduced in Nomad 1.7 where we would invoke an expensive syscall for every descendant of a process to build a tree of the processes associated with a task. Instead, we restore the behavior of scanning the process table once and rebuild the process tree manually.

Fixes #20042

@shoenig shoenig added backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent and removed stage/needs-rebase This PR needs to be rebased on main before it can be backported to pick up new BPA workflows labels May 17, 2024
@shoenig shoenig marked this pull request as ready for review May 17, 2024 13:03
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Shows what a leetcode problem this is that I immediately started thinking about memoizing by checking if the candidate.PPid() was already in family... but for that to be at all meaningful the individual tasks would have to have so many processes.


all, err := processes()
if err != nil {
return set.New[ProcessID](0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we at least return the executor PID here? Or is the thinking that if this call fails we can't trust any of the results?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good idea, let me add that

@tgross
Copy link
Member

tgross commented May 17, 2024

@shoenig you don't need a 1.8.x backport label on this, as we'll cut the RC release from main

@shoenig shoenig removed the backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent label May 17, 2024
@shoenig
Copy link
Member Author

shoenig commented May 17, 2024

but for that to be at all meaningful the individual tasks would have to have so many processes.

Yeah we could definitely add base cases for:

  • if the parent is already known to be in the family
  • if the parent is already known to not be in the family

But at that point you're optimizing microseconds when the original problem is this horrible process table snapshot syscall that takes 10's of milliseconds. I'd rather the code just be readable.

@shoenig shoenig merged commit 7d00a49 into main May 17, 2024
19 checks passed
@shoenig shoenig deleted the process-stats-pace branch May 17, 2024 14:46
tgross added a commit that referenced this pull request Oct 14, 2024
In #20619 we overhauled how we were gathering stats for Windows
processes. Unlike in Linux where we can ask for processes in a cgroup, on
Windows we have to make a single expensive syscall to get all the processes and
then build the tree ourselves. Our algorithm to do so is recursive and quadratic
in both steps and space with the number of processes on the host. For busy hosts
this hit the stack limit and panic the Nomad client.

We already build a map of parent PID to PID, so modify this to be a map of
parent PID to slice of children and then traverse that tree only from the root
we care about (the executor PID). This moves the allocations to the heap but
makes the stats gathering linear in steps and space required.

Fixes: #23984
tgross added a commit that referenced this pull request Oct 14, 2024
…ar space (#24182)

In #20619 we overhauled how we were gathering stats for Windows
processes. Unlike in Linux where we can ask for processes in a cgroup, on
Windows we have to make a single expensive syscall to get all the processes and
then build the tree ourselves. Our algorithm to do so is recursive and quadratic
in both steps and space with the number of processes on the host. For busy hosts
this hits the stack limit and panics the Nomad client.

We already build a map of parent PID to PID, so modify this to be a map of
parent PID to slice of children and then traverse that tree only from the root
we care about (the executor PID). This moves the allocations to the heap but
makes the stats gathering linear in steps and space required.

This changeset also moves as much of this code as possible into an area
 not conditionally-compiled by OS, as the tagged test file was not being run in CI.

Fixes: #23984
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nomad (v1.7.5) executor processes consume a lot of CPU when job spawns child processes on Windows Server
2 participants