-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safekeeper: add CPU profiling #9764
base: main
Are you sure you want to change the base?
Conversation
static ALLOWLIST_ROUTES: Lazy<HashSet<Uri>> = Lazy::new(|| { | ||
["/v1/status", "/metrics"] | ||
["/v1/status", "/metrics", "/pprof/profile"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't put this behind auth, since it's pretty inconvenient. The perf impact is minor, and only one profile can run at a time, so it's not a DoS vector. It only exposes function and module names, so it's not a data leak risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably give this endpoint as well as the metrics endpoint actual JWT authentication eventually. I've filed a private issue https://github.com/neondatabase/cloud/issues/20372 as it is relevant to security.
For now I agree with you though.
5490 tests run: 5247 passed, 0 failed, 243 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
e28059c at 2024-11-18T11:02:32.872Z :recycle: |
The CDDL-1.0 license used by the |
#9764, which adds profiling support to Safekeeper, pulls in the dependency [`inferno`](https://crates.io/crates/inferno) via [`pprof-rs`](https://crates.io/crates/pprof). This is licenced under the [Common Development and Distribution License 1.0](https://spdx.org/licenses/CDDL-1.0.html), which is not allowed by `cargo-deny`. This patch allows the CDDL-1.0 license. It is a derivative of the Mozilla Public License, which we already allow, but avoids some issues around European copyright law that the MPL had. As such, I don't expect this to be problematic.
55768c9
to
e28059c
Compare
Problem
We don't have a convenient way to gather CPU profiles from a running binary, e.g. during production incidents or end-to-end benchmarks, nor during microbenchmarks (particularly on macOS).
We would also like to have continuous profiling in production, likely using Grafana Cloud Profiles. We may choose to use either eBPF profiles or pprof profiles for this (pending testing and discussion with SREs), but pprof profiles appear useful regardless for the reasons listed above. See https://github.com/neondatabase/cloud/issues/14888.
This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly.
Touches #9534.
Touches https://github.com/neondatabase/cloud/issues/14888.
Summary of changes
Adds a HTTP route
/profile/cpu
that takes a CPU profile and returns it. Defaults to an SVG flamegraph over a 5-second duration, but can also emit pprof Protobuf profiles for use with e.g.pprof
or Grafana Alloy. Query parameters:format
: output format (pprof
orsvg
)frequency
: sampling frequency in microseconds (default 100)seconds
: number of seconds to profile (default 5)Also integrates pprof profiles into Criterion benchmarks, such that flamegraph reports can be taken with
cargo bench ... --profile-duration <seconds>
. Output undertarget/criterion/*/profile/flamegraph.svg
.Example profiles:
pprof
): profile.pb.gzpprof -http :6060 profile.pb.gz