bpf: Investigate the best value for wakeup_data_size #1660

dave-tucker · 2024-08-01T14:12:49Z

What would you like to be added?

This constant:
https://github.com/sustainable-computing-io/kepler/blob/main/bpf/kepler.bpf.c#L70

Declares how often we wake up to read the ringbuf.

The current math was as follows:

My system (on average) processes around 600-700 context switches per second
The sample period in Kepler is once every 3 seconds
We need to read at least one batch of ringbuf events within that 3 second interval

So 1000 should have me read every 1.7ish seconds 😄

Why is this needed?

When kepler wakes up to read events it consumes CPU. Right now that's showing us as being somewhere between 1-3% mean CPU usage over time.
We should consider whether there is a better formula we could use to compute this magic number of 1000.

It could relate to the sample rate.

e.g 500 * SampleRate and perhaps even the 500 could come from something better than an educated guess.

The text was updated successfully, but these errors were encountered:

rootfs · 2024-08-01T14:19:44Z

The Kepler CPU usage under normal and stress workloads need to be investigated in parallel. The latest stress test results point to a divergence that needs to be fixed.

rootfs · 2024-08-02T15:38:16Z

The current Kepler CPU usage is now 20% without running load.

rootfs · 2024-08-02T21:39:48Z

Test results posted on the original PR #1628

dave-tucker · 2024-08-05T12:55:37Z

The current Kepler CPU usage is now 20% without running load.

How, and on what machine, can I reproduce this result?

rootfs · 2024-08-05T12:59:40Z

@dave-tucker load the kepler latest image and keep it running for a day.

dave-tucker · 2024-08-05T13:18:47Z

Test results posted on the original PR #1628

Responded:
#1628 (comment)

dave-tucker added the kind/feature New feature or request label Aug 1, 2024

rootfs added the bug Something isn't working label Aug 1, 2024

rootfs mentioned this issue Aug 2, 2024

Regression Detected in Kepler or kube-apiserver CPU Utilization Performance sustainable-computing-io/kepler-metal-ci#95

Open

vimalk78 mentioned this issue Aug 5, 2024

High Kepler CPU usage under normal workloads #1670

Open

dave-tucker mentioned this issue Aug 5, 2024

fix(pkg/bpf): Use channel to process events #1671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpf: Investigate the best value for wakeup_data_size #1660

bpf: Investigate the best value for wakeup_data_size #1660

dave-tucker commented Aug 1, 2024 •

edited

Loading

rootfs commented Aug 1, 2024

rootfs commented Aug 2, 2024

rootfs commented Aug 2, 2024

dave-tucker commented Aug 5, 2024

rootfs commented Aug 5, 2024

dave-tucker commented Aug 5, 2024

bpf: Investigate the *best* value for wakeup_data_size #1660

bpf: Investigate the *best* value for wakeup_data_size #1660

Comments

dave-tucker commented Aug 1, 2024 • edited Loading

What would you like to be added?

Why is this needed?

rootfs commented Aug 1, 2024

rootfs commented Aug 2, 2024

rootfs commented Aug 2, 2024

dave-tucker commented Aug 5, 2024

rootfs commented Aug 5, 2024

dave-tucker commented Aug 5, 2024

bpf: Investigate the best value for wakeup_data_size #1660

bpf: Investigate the best value for wakeup_data_size #1660

dave-tucker commented Aug 1, 2024 •

edited

Loading