-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PropertyTree DDS: Leverage Binary Uploads #10684
Comments
@DLehenbauer @ruiterr @dstanesc Separate issue, follow-up of #10494 concerning base64 encoding of Property DDS summaries. |
@milanro - I'm looking into this. Thank you for starting the conversation. |
Hi @milanro - I just wanted to give you an update on what I've found: AFR uses the WholeDocumentSummary service, which uploads the entire summary as a monolithic blob: The blob is created by passing the tree of JS objects to Axios, which JSON stringifies and sends the data: Because Axios is using JSON to encode the data, we have to convert the binary buffers in the summary tree to something "jsonable". Currently we use base64 as a safe option. (I'm still chatting w/people on my side on ways around this, but let me know if you have any ideas.) The other challenge is that we currently do not plumb through the "binary" encoding type, but this one may not be so bad. I learned from @znewton that blobs result in HTTP POST calls to historian, which will be dispatched to one of these implementations on AFR: It looks like both implementations create a node Buffer from the received POST, so in theory the storage backend is already working with binary blobs. Therefore, we can theoretically plumb support for the "binary" encoding from the client, through Axios, to the service. The only catch is that we would only be able to use it from the 'uploadBlob()' API (i.e., not summaries) until we've addressed the first challenge. If you're up for it, you could start exposing the "binary" encoding option while I continue looking into the first issue. (PS - You also should go ahead and open a PR to remove the 'base64' encoding that Property DDS does manually. It doesn't fix anything since we'll still base64 encode for you, but it's a good change and a nice first commit. :-) |
I already have the implementation for #10685 where I can easily introduce (and already did but disabled for now) the base64 encoding removal at PropertyDDS in the extended class and still support stored original Property DDS legacy data. I already sent the Pull Request #10688 but it looks like it needs a review as I contribute first time. Is there a chance that someone from you would look at it. |
Being a first-time contributor, you need a maintainer to approve running the CI workflow (which I just did). All PRs go through a review/discussion process (mine too). Some things @vladsud and I have been trying to figure out are:
And for each of 1 and 2, which algorithm yields the best balance of speed, size reduction, and impact to bundle size. ...but that's hard to determine without some experiments using real-world data. Any chance you could use the hooks you've added to Property Tree DDS to help us with such an analysis? (In parallel, I would go ahead and do a small, separate PR to remove the unnecessary base64 encoding from Property Tree DDS, as that's an easy PR to approve right away and would fix your temporary "1st time contributor" status.) |
So if I understand well, it will still take some time until this gets to the fluid repository.
I will put the logs and do some benchmarks. Of course I can first push them when this pull request is in so I will do benchmarks locally at first. Is there some policy, how to perform logging (I saw some telemetry loggers etc, I will have to check how they are available in Property DDS)
The point is that this can be done easily at the code which I am trying to execute the current pull request for. It actually needs only commenting in 2 methods which are commented out. I would prefer to put it after this pull request, otherwise we get the conflict. But if you think that the approval will take really long or can even be refused, I will do it. What I wanted however is dedicated DDS for all these changes. This way, the old Property DDS summaries / ops stored in fluid will still work (and at least we already have some). The inheritance looked to me better choice than some flagging. What is your opinion (maybe I should rename the extended class as it will now support more than only compressing)? |
Just to build more confidence on the compression path towards large data I pulled some real data from the materials team (on HxGN side, material mgmt. is one of the initial drivers for the large data support). The high level conclusion is that material management has good size reduction potential (using compression) in both scenarios: summaries & operations For a better understanding, the original design planned operations having individual material granularity and summaries holding complete material libraries. A library can have from hundreds to thousands of them. material json
material json base64 encoded
material library, 52 materials, json encoded
I would like to highlight few important disclaimers:
I also believe we should have separate discussions for summary and operation compression as data sizes differ by one to more orders of magnitude. Likely different configurations and even instrumentation. For instance, as operations typically tend to have smaller sizes would have more sense to apply compression to a larger batch of operations (not sure at this stage whether fluid batches operations before transmission) |
This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework! |
This issue is the follow-up of the epic #10494 focusing on the paragraph PropertyTree DDS: Leverage Binary Uploads.
The epic states :
PropertyTree currently leverages MsgPackR to serialize summary chunks instead of a the less efficient JSON encoding. However, due to a past limitation of the Fluid Runtime API, the resulting binary is then base64 encoded (~33% penalty).
We've since introduced the SummaryTreeBuilder, which allows Uint8Arrays to be attached to the summary without first encoding as base64. Using SummaryTreeBuilder to avoid the base64 encoding is both a size and perf win.
The resullt of investigation :
It looks like Property DDS already uses SummaryTreeBuilder but still encodes the blobs with base64. When this is disabled and the blobs are in binary form, it looks that the routerlicious Upload Manager converts it to base64 automatically. Is there any trick to force the binary blobs in the SummaryTree (or is there other SummaryTreeBuilder available than that one used by Property DDS)?
The WholeSummaryUploadManager is used for uploading to routerlicious, the implementation is located in
/server/routerlicious/packages/services-client/src/wholeSummaryUploadManager.ts
The following method is called in order to upload
WholeSummaryUploadManager.writeSummaryTreeCore(...)
which generates the transfer form of the tree at
server/routerlicious/packages/services-client/src/storageUtils.ts
function
convertSummaryTreeToWholeSummaryTree(...)
This method encodes the binary blobs by base64
if (typeof summaryObject.content === "string") {
...
content: summaryObject.content,
...
} else {
...
content: Uint8ArrayToString(summaryObject.content, "base64"),
...
The text was updated successfully, but these errors were encountered: