Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine grained CPU profiling #804

Closed
sandreim opened this issue Jun 24, 2022 · 4 comments
Closed

Fine grained CPU profiling #804

sandreim opened this issue Jun 24, 2022 · 4 comments

Comments

@sandreim
Copy link
Contributor

sandreim commented Jun 24, 2022

There have been efforts to solve this earlier like paritytech/polkadot#4871 but the results we got were not providing enough insights due to the low sampling rate/storage - https://pyroscope.io/docs/storage-design/ . We should continue the effort to implement something that works better for our usecase. We need as fine grained as possible CPU profiling (100us) with visualisation tooling to increase accuracy and decrease the scope of debugging when dealing with node performance issues or optimization work.

The solution should also consider this must also work easily with Zombienet to test performance regression in the CI pipeline.

@ordian
Copy link
Member

ordian commented Aug 19, 2022

Just to clarity, the profiler frequency used in pyroscope is configurable. The problem is that it accumulates the profiling info into segments of 10s, which is hardcoded in their server in many places. Cf grafana/pyroscope#901.
Thus the critical section we want to profile gets lost in the noise of other tasks such as networking.

Also worth mentioning that we don't need to run it on every node. One validator and one collator per parachain would be fine.

We could try another tool in the same category of continuous profilers or try to fork their server.

@alindima
Copy link
Contributor

After my latest experiences re-enabling pyroscope and pprof-rs, I can confirm that we need a different solution because:

  • pprof-rs has a high overhead, since it's doing profiling being based on signal handlers (and pyroscope unregisters/registers the signal handlers every 10 seconds).
  • libunwind mechanism is buggy and causes SIGABORTs: SIGABORT when profiling with pyroscope-rs tikv/pprof-rs#219
  • pyroscope-rs repo does not seem under a lot of active maintenance. I also discovered a bug there: fix spurious exit when epoll_wait is interrupted by a signal grafana/pyroscope-rs#125
  • even with frame pointer unwinding and the above fix, after a couple of hours of running the validator with pyroscope enabled, most of the nodes tasks are killed, including the pyroscope agent (I assume there's a memory leak that triggers an OOM somewhere), which leaves the node in an inconsistent but still running state.

Ideally, we'd want to use perf on linux to profile, since it's very mature and flexible. However, I don't know how feasible it is considering that I couldn't find a continuous profiler with flamegraph visualisations that uses perf as a data source.

And there is also the storage problem. Profiling spits out a ton of data. If the frequency is too small, the data is not representative. If the frequency is too high, it can overwhelm the server and become unmanageable.

@petethepig
Copy link

@alindima FWIW it's possible to send perf or eBPF profling data to pyroscope in collapsed (or folded) format, we have documentation on how to upload that data to Pyroscope using an HTTP API here.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
* Add evm-chain-id pallet

Signed-off-by: koushiro <[email protected]>

* Add some doc

Signed-off-by: koushiro <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
bkchr pushed a commit that referenced this issue Apr 10, 2024
* Remove unused Config types from `pallet-finality-verifier`

* Remove unused AncestryChecker trait

* Remove ancestry proof parameter from relayer calls

* Update docs to reflect current state of pallet

* Remove mock ancestry checker

* Remove unused error

* Write headers outside of function used for authority set changes

* Move justification verification into helper function

* Add documentation suggestions

Co-authored-by: Tomasz Drwięga <[email protected]>

* Clean up module level documentation a bit

Co-authored-by: Tomasz Drwięga <[email protected]>
@alexggh
Copy link
Contributor

alexggh commented May 31, 2024

Superseeded by using subsystem benchmarks with pyroscope.

@alexggh alexggh closed this as completed May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Completed
Development

No branches or pull requests

7 participants
@petethepig @ordian @alindima @alexggh @sandreim @the-right-joyce and others