Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce deserialize_backref_record() #381

Merged
merged 1 commit into from
Feb 15, 2024
Merged

introduce deserialize_backref_record() #381

merged 1 commit into from
Feb 15, 2024

Conversation

arvidn
Copy link
Contributor

@arvidn arvidn commented Feb 12, 2024

This new function/overload records all the nodes being back-referenced to. This helps inform, e.g. tree_hash() which nodes' hashes should be cached.

This feature enables performance improvements in chia_rs. Those improvements are reported below.

Hashing speed-up: 5.0 - 5.85

The speed-up of hashing a whole block generator, of a full block (50% full given our current farmer fill rate). These are real blocks from mainnet, picked out during the period of high mempool pressure, and full blocks.

Generator speed-up: 1.26 - 1.88

The speed up of running the block generator of a compressed block using run_block_generator2(), i.e. with the rust implementation of the generator ROM. This is what enables the improved hashing of the puzzles.

This speed-up won't be realized on chain until after the hard-fork.

benchmarks, MacOS M1

run_generator2

when taking advantage of this information exported from the deserializer, we can optimize tree_hash() and also optimize the tree hashing in run_block_generator2().

Since the cache takes advantage of the block being "compressed" (i.e. using clvm backrefs), it won't impact uncompressed blocks.

run_block_generator2 block-1ee588dc-compressed             1.00     15.6±0.06ms        ? ?/sec    1.76     27.5±0.33ms        ? ?/sec
run_block_generator2 block-225758-compressed               1.03      5.1±0.13ms        ? ?/sec    1.00      5.0±0.06ms        ? ?/sec
run_block_generator2 block-4671894-compressed              1.05    330.9±4.64ms        ? ?/sec    1.00    315.8±2.13ms        ? ?/sec                                                    
run_block_generator2 block-6fe59b24-compressed             1.00     18.8±0.22ms        ? ?/sec    1.68     31.6±0.65ms        ? ?/sec
run_block_generator2 block-834752-compressed               1.00      2.6±0.02ms        ? ?/sec    1.26      3.2±0.09ms        ? ?/sec
run_block_generator2 block-b45268ac-compressed             1.00     16.0±0.19ms        ? ?/sec    1.75     28.0±0.39ms        ? ?/sec
run_block_generator2 block-c2a8df0d-compressed             1.00     15.8±0.11ms        ? ?/sec    1.88     29.7±0.41ms        ? ?/sec
run_block_generator2 block-e5002df2-compressed             1.00     17.1±0.24ms        ? ?/sec    1.68     28.7±0.41ms        ? ?/sec
run_block_generator2 deep-recursion-plus-compressed        1.00       2.8±0.02s        ? ?/sec    1.00       2.9±0.06s        ? ?/sec
run_block_generator2 duplicate-coin-announce-compressed    1.00       3.1±0.03s        ? ?/sec    1.00       3.1±0.03s        ? ?/sec
run_block_generator2 recursion-pairs-compressed            1.00       2.5±0.02s        ? ?/sec    1.00       2.5±0.03s        ? ?/sec

tree_hash_from_bytes

The tree_hash_from_bytes() function, when updated to use a cache, has the following benchmarks. Again, only the compressed CLVM would improve.

tree-hash-from-stream block-1ee588dc-compressed            1.00      2.8±0.03ms        ? ?/sec    5.08     14.2±0.22ms        ? ?/sec
tree-hash-from-stream block-225758-compressed              1.00     61.0±0.59µs        ? ?/sec    1.00     61.1±0.39µs        ? ?/sec
tree-hash-from-stream block-4671894-compressed             1.00      2.3±0.02ms        ? ?/sec    1.13      2.6±0.03ms        ? ?/sec
tree-hash-from-stream block-6fe59b24-compressed            1.00      3.7±0.10ms        ? ?/sec    4.15     15.2±0.25ms        ? ?/sec
tree-hash-from-stream block-834752-compressed              1.00    254.2±8.39µs        ? ?/sec    3.64   926.5±35.90µs        ? ?/sec
tree-hash-from-stream block-b45268ac-compressed            1.00      2.7±0.02ms        ? ?/sec    5.00     13.6±0.14ms        ? ?/sec
tree-hash-from-stream block-c2a8df0d-compressed            1.00      2.8±0.02ms        ? ?/sec    5.85     16.3±0.28ms        ? ?/sec

profile

This is where we spend time in the run_generator test. This is test case block-1ee588dc run in Linux, AMD Threadripper. The test runs both the regular run_block_generator() (which uses the CLVM implememtation of the ROM) as well as run_block_generator2(), with the rust implementation of the ROM.

- chia::gen::test_generators::run_generator
   - 76,73% chia::gen::run_block_generator::run_block_generator
      - 74,78% clvmr::run_program::run_program
         - clvmr::run_program::RunProgramContext<D>::run_program (inlined)
            + 32,94% clvmr::run_program::RunProgramContext<D>::apply_op (inlined)
            + 22,69% clvmr::run_program::RunProgramContext<D>::swap_eval_op (inlined)
            + 10,51% clvmr::run_program::RunProgramContext<D>::cons_op (inlined)
            + 5,24% alloc::vec::Vec<T,A>::pop (inlined)
            + 0,84% clvmr::run_program::augment_cost_errors (inlined)
              0,73% core::slice::<impl [T]>::last (inlined)
      + 1,32% clvmr::serde::de_br::node_from_bytes_backrefs
        0,59% chia::gen::conditions::parse_spends
   - 17,18% chia::gen::run_block_generator::run_block_generator2
      - 8,76% clvmr::run_program::run_program
         - clvmr::run_program::RunProgramContext<D>::run_program (inlined)
            + 3,91% clvmr::run_program::RunProgramContext<D>::apply_op (inlined)
            + 2,66% clvmr::run_program::RunProgramContext<D>::swap_eval_op (inlined)
            + 0,99% clvmr::run_program::RunProgramContext<D>::cons_op (inlined)
            + 0,86% alloc::vec::Vec<T,A>::pop (inlined)
      - 7,17% clvm_utils::tree_hash::tree_hash_cached
         - 3,85% clvm_utils::tree_hash::tree_hash_pair
            + 2,06% <D as digest::digest::Digest>::finalize (inlined)
            + 1,73% <D as digest::digest::Digest>::update (inlined)
         - 2,65% clvm_utils::tree_hash::tree_hash_atom
            + 2,25% <D as digest::digest::Digest>::finalize (inlined)
      + 1,19% clvmr::serde::de_br::node_from_bytes_backrefs_record
   + 1,78% core::ptr::drop_in_place<clvmr::allocator::Allocator> (inlined)
   + 1,58% chia::gen::test_generators::print_conditions

uncompressed blocks

For reference, here are the benchmarks for the uncompressed blocks. These are not expected to gain a boost:

run_block_generator2 block-1ee588dc                        1.00     28.0±0.21ms        ? ?/sec    1.03     28.7±0.41ms        ? ?/sec
run_block_generator2 block-225758                          1.02      5.1±0.07ms        ? ?/sec    1.00      5.0±0.04ms        ? ?/sec
run_block_generator2 block-4671894                         1.04    329.0±2.27ms        ? ?/sec    1.00    315.3±2.22ms        ? ?/sec
run_block_generator2 block-6fe59b24                        1.00     31.8±0.25ms        ? ?/sec    1.04     33.2±0.34ms        ? ?/sec
run_block_generator2 block-834752                          1.04      3.4±0.09ms        ? ?/sec    1.00      3.2±0.02ms        ? ?/sec
run_block_generator2 block-b45268ac                        1.00     28.2±0.20ms        ? ?/sec    1.02     28.8±0.24ms        ? ?/sec
run_block_generator2 block-c2a8df0d                        1.00     30.9±0.20ms        ? ?/sec    1.06     32.8±1.47ms        ? ?/sec
run_block_generator2 block-e5002df2                        1.00     29.7±0.42ms        ? ?/sec    1.01     30.1±0.20ms        ? ?/sec
run_block_generator2 deep-recursion-plus                   1.02       2.8±0.02s        ? ?/sec    1.00       2.8±0.02s        ? ?/sec
run_block_generator2 duplicate-coin-announce               1.00       3.1±0.02s        ? ?/sec    1.02       3.1±0.06s        ? ?/sec
run_block_generator2 recursion-pairs                       1.00       2.5±0.01s        ? ?/sec    1.00       2.5±0.03s        ? ?/sec
tree-hash-from-stream block-1ee588dc                       1.05     16.0±0.12ms        ? ?/sec    1.00     15.3±0.20ms        ? ?/sec   
tree-hash-from-stream block-225758                         1.06     64.4±0.68µs        ? ?/sec    1.00     61.0±0.38µs        ? ?/sec   
tree-hash-from-stream block-4671894                        1.05      2.7±0.05ms        ? ?/sec    1.00      2.6±0.04ms        ? ?/sec   
tree-hash-from-stream block-6fe59b24                       1.03     17.1±0.37ms        ? ?/sec    1.00     16.5±0.33ms        ? ?/sec   
tree-hash-from-stream block-834752                         1.03  1025.1±13.01µs        ? ?/sec    1.00    993.8±9.76µs        ? ?/sec   
tree-hash-from-stream block-b45268ac                       1.04     15.5±0.24ms        ? ?/sec    1.00     15.0±0.31ms        ? ?/sec   
tree-hash-from-stream block-c2a8df0d                       1.03     18.3±0.24ms        ? ?/sec    1.00     17.8±0.47ms        ? ?/sec   

Copy link

coveralls-official bot commented Feb 12, 2024

Pull Request Test Coverage Report for Build 7907475742

Details

  • 0 of 67 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.06%) to 94.362%

Totals Coverage Status
Change from base Build 7907473600: 0.06%
Covered Lines: 5808
Relevant Lines: 6155

💛 - Coveralls

@arvidn arvidn marked this pull request as ready for review February 12, 2024 20:53
Rigidity
Rigidity previously approved these changes Feb 13, 2024
@arvidn arvidn mentioned this pull request Feb 14, 2024
src/more_ops.rs Outdated Show resolved Hide resolved
src/serde/de_br.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@richardkiss richardkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for maybe the name of the visitor function, looks good.

…ing back-referenced to. This helps inform, e.g. tree_hash() which nodes' hashes should be cached.
@arvidn arvidn merged commit 3f824fe into main Feb 15, 2024
47 of 48 checks passed
@arvidn arvidn deleted the deserialize-record branch February 15, 2024 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants