-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7582: Add naive RLP caching for BlockHeader, Transaction, and Withdrawal #7988
7582: Add naive RLP caching for BlockHeader, Transaction, and Withdrawal #7988
Conversation
Signed-off-by: Matilda Clerke <[email protected]>
…ng-during-sync # Conflicts: # ethereum/core/src/main/java/org/hyperledger/besu/ethereum/core/BlockHeader.java
Signed-off-by: Matilda Clerke <[email protected]>
I see the potential benefit of the caching, but since this is a tradeoff between compute and memory, have you done benchmarks to quantify the performance benefit of the change, and the possible impact on memory? |
Fair point. I modified the BlockHeadersMessageTest locally to produce 10000 headers and timed the looped writeTo calls in BlockHeadersMessage. The original code performed the 10000 writeTo calls in 18 to 23ms, while the updated code performed the 10000 writeTo calls in 13 to 18ms. In terms of memory, it's actually just using indexes to the original underlying array, so it should only be a small amount of extra memory used. |
So the micro benchmark test confirm that there is a gain on the compute side, and that is worth exploring more, for that I suggest to run some real sync on mainnet with this change and compare the results against some control instances. |
Agreed, but those references will keep strong links to the underlying byte arrays preventing it from garbage collection. I would like to have a rationale on what this feature is going to improve. It is an interesting improvement but is there is real use case for it ? and if so, we need to evaluate, how much is the latency improvement for how much overhead in terms of memory ? |
On the implementation side, have you thought about different approaches to the caching? |
From the micro benchmark, we'd expect a time saving of around 10 seconds for 10 million block headers. Realistically, not a noticeable gain on our current 20+ hour sync. However, it should give us a small saving of CPU utilisation. Memory breakdown is as follows: Regarding garbage collection: If the headers are currently getting garbage collected (I'd expect they are, but not sure), these references won't prevent the underlying array from being garbage collected. For this PR, I was looking just for some potential quick wins while continuing to focus mainly on my refactoring of peer tasks. @pinges is working on a more compilcated scheme which manages to avoid decoding the RLP for large portions of the block body in addition to avoiding re-encoding as we're doing here. |
Signed-off-by: Matilda-Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Backed out the withdrawal changes as we discussed and don't believe the changes will ever be used. |
Consider using JOL (Java Object Layout) or a heap dump to have more accurate numbers. With CompressedOOPs enabled, each Java object has a header of 12 Bytes if the JVM heap < 32 GiB or 16 Bytes if JVM heap >= 32 GiB. Depending on the total number of bytes for each object, there is internal and external padding to make the size a multiple of 8.
I think we should focus on the RLP raw data, the underlying byte array, not the headers. The current calculation focuses more on the shallow size because it is deterministic, and takes into account only the headers and the references. With the RLP stored in the transaction, we need to evaluate the impact on the transactionPool in terms of memory usage and garbage collection. |
In this screenshot below, we can see that the block header is having a reference to a ~2 MBytes byte array (rlp) which I guess is the RLP of the block, but that would be a big block, as we can see the offset and the length inside that AbstractWrappingBytes.
I guess all the transactions of that specific block, and the header are referencing the same underlying byte array. Now, we need to evaluate the real memory overhead. To do that, we need to have the number of live blocks in memory during sync and the average of the size of the RLP. You can share a heap dump during sync with your PR. I will extract the numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suggestion to init RLP during decoding the transaction from the RLP. I think overall, there is no much overhead when I look the heap dump as the underlying byte array is already referenced by the payload of the transaction, and to address.
return Transaction.builder() | ||
.copiedFrom(transaction) | ||
.rawRlp(Optional.of(transactionRlp.raw())) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling again a builder to add RLP, I would do it directly in each transaction decoder just after the payload, as they're very similar fields :
Line 52 in 2909ea4
.payload(input.readBytes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the transaction level, as the payload is already referencing the same underlying byte array, the only overhead is the reference of the rlp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern in applying this to the transaction encoder/decoder classes is that they seem to only populate a subset of the Transaction fields, so if we try to instead supply in the encoder the full original RLP, it may be significantly larger than it is currently.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be no difference, as rawRlp and payload are just references to the same byte array with an offset and a length. It doesn't change the size of the transaction object. With copiedFrom, we lose the first reference of the transaction as we're creating another one.
The question I didn't investigate is wether we have the (whole) rawRlp of the transaction at this level.
Signed-off-by: Matilda-Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
It seems these latest changes aren't quite working right. In particular, a block encoded and written to blockchainStorage is causing RLP errors when being read and decoded. I'm parking this briefly for now to progress some other issues. |
and handle empty lists in AbstractRLPInput Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
…7582-avoid-unnecessary-rlp-encoding-during-sync
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
…7582-avoid-unnecessary-rlp-encoding-during-sync
…r classes Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda-Clerke <[email protected]>
…an exception in test Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just few suggestions.
On the performance side, do you have an estimation on how much this will speed up the initial sync?
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/core/BlockHeader.java
Outdated
Show resolved
Hide resolved
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/core/Transaction.java
Outdated
Show resolved
Hide resolved
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/core/Transaction.java
Outdated
Show resolved
Hide resolved
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/core/Transaction.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/hyperledger/besu/ethereum/core/encoding/AccessListTransactionEncoder.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
@fab-10 I calculated it to shave about 10 seconds off of the sync for block headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about all the LOG.trace calls and I found one place where we are potentially doing the wrong thing. Please have a look.
@@ -37,21 +37,23 @@ private EIP1559TransactionDecoder() { | |||
} | |||
|
|||
public static Transaction decode(final RLPInput input) { | |||
input.enterList(); | |||
final BigInteger chainId = input.readBigIntegerScalar(); | |||
RLPInput transactionRlp = input.readAsRlp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed. Everything could be done on input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's correct. Potentially, the RLPInput will contain more than just the single transactionRlp that we're extracting here, and including any extras in the rawRlp will cause significant problems later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that now :-)
...ore/src/main/java/org/hyperledger/besu/ethereum/core/encoding/EIP1559TransactionEncoder.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/hyperledger/besu/ethereum/core/encoding/FrontierTransactionDecoder.java
Show resolved
Hide resolved
ethereum/rlp/src/main/java/org/hyperledger/besu/ethereum/rlp/AbstractRLPInput.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
Signed-off-by: Matilda Clerke <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, it would have been nice to have a better impact on reducing syncing time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…wal (hyperledger#7988) * 7582: Add naive RLP caching for BlockHeader, Transaction, and Withdrawal Signed-off-by: Matilda Clerke <[email protected]> * 7582: spotless Signed-off-by: Matilda Clerke <[email protected]> * 7582: Fix broken test after merge Signed-off-by: Matilda Clerke <[email protected]> * 7582: Back out withdrawal changes Signed-off-by: Matilda Clerke <[email protected]> * 7582: Fix up compile error after merge Signed-off-by: Matilda Clerke <[email protected]> * 7582: Apply naive RLP caching in transaction encoder/decoder classes Signed-off-by: Matilda Clerke <[email protected]> * 8053: Add RLPDecodingHelpers.Kind.EM and handle empty lists in AbstractRLPInput Signed-off-by: Matilda Clerke <[email protected]> * 7582: Add trace logging to AbstractRLPInput Signed-off-by: Matilda Clerke <[email protected]> * Revert previous change, implement new fix Signed-off-by: Matilda Clerke <[email protected]> * Revert previous change, implement new fix Signed-off-by: Matilda Clerke <[email protected]> * 7582: Finish applying naive RLP caching to transaction encoder/decoder classes Signed-off-by: Matilda Clerke <[email protected]> * Remove unused LOG Signed-off-by: Matilda Clerke <[email protected]> * 7582: Make explicit the expectation that RLP.validate does not throw an exception in test Signed-off-by: Matilda Clerke <[email protected]> * 7582: SPOTLESS Signed-off-by: Matilda Clerke <[email protected]> * 7582: Remove redundant curly braces Signed-off-by: Matilda Clerke <[email protected]> * 7582: Remove Optional from rawRlp in Transaction.Builder Signed-off-by: Matilda Clerke <[email protected]> * 7582: Make new BlockHeader constructor private Signed-off-by: Matilda Clerke <[email protected]> * Refactor tranaction encoder classes Signed-off-by: Matilda Clerke <[email protected]> * Remove logging changes from RLP module Signed-off-by: Matilda Clerke <[email protected]> * spotless Signed-off-by: Matilda Clerke <[email protected]> --------- Signed-off-by: Matilda Clerke <[email protected]> Signed-off-by: Matilda-Clerke <[email protected]> Co-authored-by: Stefan Pingel <[email protected]> Signed-off-by: Bhanu Pulluri <[email protected]>
PR description
Add naive RLP caching for BlockHeader, Transaction, and Withdrawal. This PR currently ignores Block, BlockBody, and TransactionReceipt due to complexity. Also included are some trace logs I used to help diagnose an issue found while working on this ticket.
Issue
#7582