Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows: trade heap for stack to build process tree for stats in linear space #24182

Merged
merged 2 commits into from
Oct 14, 2024

Conversation

tgross
Copy link
Member

@tgross tgross commented Oct 11, 2024

In #20619 we overhauled how we were gathering stats for Windows processes. Unlike in Linux where we can ask for processes in a cgroup, on Windows we have to make a single expensive syscall to get all the processes and then build the tree ourselves. Our algorithm to do so is recursive and quadratic in both steps and space with the number of processes on the host. For busy hosts this hit the stack limit and panics the Nomad client.

We already build a map of parent PID to PID, so modify this to be a map of parent PID to slice of children and then traverse that tree only from the root we care about (the executor PID). This moves the allocations to the heap but makes the stats gathering linear in steps and space required.

This changeset also moves as much of this code as possible into an area not conditionally-compiled by OS, as the tagged test file was not being run in CI.

Fixes: #23984


Note to reviewers: I've got 2 commits here. One has the old and new code side-by-side in a test for comparison. The second commit removes the old code. Test results:

$ go test -v -count=1 ./drivers/shared/executor/procstats -run Test_list
=== RUN   Test_list
=== RUN   Test_list/minimal
    list_test.go:120: NEW total: 14 -> examined: 16 (heap: 9192 stack: 0)
    list_test.go:136: OLD total: 14 -> examined: 61 (heap: 7064 stack: 0)
=== RUN   Test_list/small_needles_small_haystack
    list_test.go:120: NEW total: 207 -> examined: 212 (heap: 55784 stack: 0)
    list_test.go:136: OLD total: 207 -> examined: 20112 (heap: 27920 stack: 32768)
=== RUN   Test_list/small_needles_large_haystack
    list_test.go:120: NEW total: 1012 -> examined: 1022 (heap: 288040 stack: 0)
    list_test.go:136: OLD total: 1012 -> examined: 500522 (heap: 150888 stack: 98304)
=== RUN   Test_list/moderate_needles_giant_haystack
    list_test.go:120: NEW total: 2022 -> examined: 2042 (heap: 606408 stack: 0)
    list_test.go:136: OLD total: 2022 -> examined: 2001042 (heap: 332040 stack: 229376)
--- PASS: Test_list (0.32s)
    --- PASS: Test_list/minimal (0.00s)
    --- PASS: Test_list/small_needles_small_haystack (0.01s)
    --- PASS: Test_list/small_needles_large_haystack (0.06s)
    --- PASS: Test_list/moderate_needles_giant_haystack (0.25s)
PASS
ok      github.com/hashicorp/nomad/drivers/shared/executor/procstats    0.353s

@tgross tgross changed the title windows: trade memory for time in building process tree for stats windows: trade memory to build process tree for stats in linear time Oct 11, 2024
@tgross tgross changed the title windows: trade memory to build process tree for stats in linear time windows: trade heap for stack to build process tree for stats in linear time Oct 14, 2024
@tgross tgross changed the title windows: trade heap for stack to build process tree for stats in linear time windows: trade heap for stack to build process tree for stats in linear space Oct 14, 2024
In #20619 we overhauled how we were gathering stats for Windows
processes. Unlike in Linux where we can ask for processes in a cgroup, on
Windows we have to make a single expensive syscall to get all the processes and
then build the tree ourselves. Our algorithm to do so is recursive and quadratic
in both steps and space with the number of processes on the host. For busy hosts
this hit the stack limit and panic the Nomad client.

We already build a map of parent PID to PID, so modify this to be a map of
parent PID to slice of children and then traverse that tree only from the root
we care about (the executor PID). This moves the allocations to the heap but
makes the stats gathering linear in steps and space required.

Fixes: #23984
@tgross tgross added theme/platform-windows theme/metrics type/bug backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line labels Oct 14, 2024
@tgross tgross modified the milestones: 1.9.0, 1.9.x Oct 14, 2024
@tgross tgross marked this pull request as ready for review October 14, 2024 13:44
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgross tgross merged commit fec91d1 into main Oct 14, 2024
44 checks passed
@tgross tgross deleted the windows-process-list branch October 14, 2024 15:26
@tgross tgross modified the milestones: 1.9.x, 1.9.1 Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/1.9.x backport to 1.9.x release line theme/metrics theme/platform-windows type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infinite recursion on list_windows.go
2 participants