-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically probe system for availability of larger (1M) BPF instruction limit #2040
Labels
area/datacollector
Issues related to Stirling (datacollector)
Comments
This was referenced Oct 9, 2024
I have a crude test on this branch that inserts > 4096 BPF asm instructions. This test works on < 5.2 kernels, but it appears to pass on the 6.11 kernel that is later scope to 4096 instructions when the socket tracer program is loaded. More investigation needs to be done to see how this dynamic probe can trigger the 6.11 instruction limit restriction. |
ddelnano
added a commit
that referenced
this issue
Oct 11, 2024
…er kernels (#2041) Summary: Upgrade bcc and libbpf to fix BPF program compilation on 6.10 and later kernels Bcc provides some "[virtual](https://github.com/iovisor/bcc/blob/cb1ba20f4800f556dc940682ba7016c50bd0a3ac/src/cc/exported_files.cc#L28-L48)" includes to BPF programs. The `compat/linux/virtual_bpf.h` file in particular needs to be kept in sync with libbpf and matches the [header guard](https://github.com/iovisor/bcc/blob/cb1ba20f4800f556dc940682ba7016c50bd0a3ac/src/cc/compat/linux/virtual_bpf.h#L9) of the `include/uapi/linux/bpf.h` file. This means that while our linux headers were updated, our older bcc install was inserting an older copy of the `uapi/linux/bpf.h` file -- one that didn't contain the `bpf_wq` declaration. ``` include/linux/bpf.h:348:10: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_wq' return sizeof(struct bpf_wq); ^ ~~~~~~~~~~~~~~~ include/linux/bpf.h:348:24: note: forward declaration of 'struct bpf_wq' return sizeof(struct bpf_wq); ^ include/linux/bpf.h:377:10: error: invalid application of '__alignof' to an incomplete type 'struct bpf_wq' return __alignof__(struct bpf_wq); ^ ~~~~~~~~~~~~~~~ include/linux/bpf.h:377:29: note: forward declaration of 'struct bpf_wq' return __alignof__(struct bpf_wq); ``` Note: while this fixes the 6.10 compilation issue, our 6.10 qemu build fails without disabling [this logic](https://github.com/pixie-io/pixie/blob/3c41d554215528e688328aef94192e696db617dc/src/stirling/source_connectors/socket_tracer/socket_trace_connector.cc#L464-L472). 6.10 kernels added BPF token support. This changes the BPF permission model slightly and causes the BPF instruction limit to be dependent on the permissions of the BPF syscall caller ([linux source](https://elixir.bootlin.com/linux/v6.11.1/source/kernel/bpf/syscall.c#L2757)). This new BPF token logic coupled with our qemu setup, causes our 6.10 build to fallback to the 4096 instruction limit. I'll be addressing this in #2040 and #2042. Those issues shouldn't block this change since that loop limit code can be bypasses at runtime with our current cli flags. Relevant Issues: Closes #2035 Type of change: /kind bugfix Test Plan: Built 6.10 and 6.11 kernels and the associated linux headers from #2036 and verified that a local qemu build passes - [x] Verify `#ci:bpf-build-all-kernels` build passes Changelog Message: Upgraded bcc and libbpf to support kernels 6.10 and later --------- Signed-off-by: Dom Del Nano <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
In #1795, we introduced the ability for kernels that support the 1M BPF program limit to raise certain tunables used to restrict program size. Many upstream distros and companies backport features, which means that the kernel version is somewhat meaningless (older kernels could have the feature you are checking for). We knew this was a possibility, but thought it was safe to assume that any version > 5.1 would be guaranteed to have the functionality.
Linux 6.10 introduced BPF tokens, which are a newer way to allow less privileged processes to access BPF functionality. This changes the BPF permission model and can even cause programs to fall back to the lower, 4096 instruction limit. Therefore, it's possible that a process with less permissions can be forced to use a lower limit. This causes PEMs with this reduced permission to fail to start since our tunables to reduce program size will still be increased despite the kernel forcing us to use 4096 instructions.
This was detected while fixing #2035, where our adhoc 6.10 qemu builds failed with instruction limit errors (as seen below):
We should change this logic to based on a dynamic probe of the system.
bpftool
is able to identify if a kernel is capable of the higher instruction limit by inserting a test program above the 4096 limit. This would address this bug and properly identify any backported kernels that should have the higher limits applied.The text was updated successfully, but these errors were encountered: