Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Investigate the *best* value for wakeup_data_size #1660

Open
dave-tucker opened this issue Aug 1, 2024 · 6 comments
Open

bpf: Investigate the *best* value for wakeup_data_size #1660

dave-tucker opened this issue Aug 1, 2024 · 6 comments
Labels
bug Something isn't working kind/feature New feature or request

Comments

@dave-tucker
Copy link
Collaborator

dave-tucker commented Aug 1, 2024

What would you like to be added?

This constant:
https://github.com/sustainable-computing-io/kepler/blob/main/bpf/kepler.bpf.c#L70

Declares how often we wake up to read the ringbuf.

The current math was as follows:

  1. My system (on average) processes around 600-700 context switches per second
  2. The sample period in Kepler is once every 3 seconds
  3. We need to read at least one batch of ringbuf events within that 3 second interval

So 1000 should have me read every 1.7ish seconds 😄

Why is this needed?

When kepler wakes up to read events it consumes CPU. Right now that's showing us as being somewhere between 1-3% mean CPU usage over time.
We should consider whether there is a better formula we could use to compute this magic number of 1000.

It could relate to the sample rate.

e.g 500 * SampleRate and perhaps even the 500 could come from something better than an educated guess.

@dave-tucker dave-tucker added the kind/feature New feature or request label Aug 1, 2024
@rootfs
Copy link
Contributor

rootfs commented Aug 1, 2024

The Kepler CPU usage under normal and stress workloads need to be investigated in parallel. The latest stress test results point to a divergence that needs to be fixed.

@rootfs rootfs added the bug Something isn't working label Aug 1, 2024
@rootfs
Copy link
Contributor

rootfs commented Aug 2, 2024

The current Kepler CPU usage is now 20% without running load.
asciicast

@rootfs
Copy link
Contributor

rootfs commented Aug 2, 2024

Test results posted on the original PR #1628

@dave-tucker
Copy link
Collaborator Author

The current Kepler CPU usage is now 20% without running load. asciicast

How, and on what machine, can I reproduce this result?

@rootfs
Copy link
Contributor

rootfs commented Aug 5, 2024

@dave-tucker load the kepler latest image and keep it running for a day.

@dave-tucker
Copy link
Collaborator Author

Test results posted on the original PR #1628

Responded:
#1628 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants