Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container issues at connection #20274

Closed
erbox02000 opened this issue Mar 22, 2024 · 10 comments
Closed

Container issues at connection #20274

erbox02000 opened this issue Mar 22, 2024 · 10 comments
Labels
bug Something isn't working community-contribution

Comments

@erbox02000
Copy link

Describe the bug

Following the closure of this bug: Issue at container connection : Provided user was not an "AzureUser", we upgraded the fluid framework to the release v2.0.0-rc.2.0.0.

We are now facing different issues :

image

When we are successfully connected and when there is data into the container, we often get this issue :

image

And finally, we still have the previous error but with a different wording when we tried to be connected to the same container with multi users.

image

We are still using the same way to connect the Fluid relay, see more details on the previous bug (Issue at container connection : Provided user was not an "AzureUser")

@pk-pranshu
Copy link
Contributor

Thanks for reporting the issue. We have started the investigation and are treating this as high priority.

@CraigMacomber
Copy link
Contributor

The for the "AzureUser" errors, see https://github.com/microsoft/FluidFramework/releases/tag/client_v2.0.0-rc.2.0.1 : I believe that patch should fix it.

Error 0x739 is an error produced when failing to decode compressed SharedTree data. It seems your data must be corrupted somehow or there is a bug in either the encoder or the decoder. This is mostly likely a bug in shared tree: thanks for the bug report!

We have an optional validator intended to be able to be opted into to assist with debugging cases like this (to check data is well-formed when output and when parsed), but sadly there is no easy way to opt into enabling it right now. I have authored #20332 to provide a way to enable it in your app.

Combine that with

import {
	configuredSharedTree,
	typeboxValidator,
	// eslint-disable-next-line import/no-internal-modules
} from "@fluidframework/tree/internal";
const SharedTree = configuredSharedTree({
	jsonValidator: typeboxValidator,
});

and it should ideally detect if malformed data was generated, or if the data its trying to decode isn't in the right format.

Since that's not published yet, and I'd still like to try and root cause this, could you provide the data which is causing the issue?

To do so in the browser, enable breaking on caught exception and when assert 0x739 is hit, inspect the failing location with the debugger (1 up the call stack from the assert throwing the 0x739 error) and you should find a stack frame for NestedArrayDecoder.decode. In that context, evaluate JSON.stringify(stream) and provide that result. Also include the full call stack. This will include document content, so only share it here if its fine for that to be publicly posted.

Alternatively, if your application or its source are somewhere public I can run and you have a way I could reproduce the exception, I could extra that information myself.

@CraigMacomber
Copy link
Contributor

I have created an internal bug to track this: https://dev.azure.com/fluidframework/internal/_workitems/edit/7582 to help ensure if gets fixed, but I'll attempt to keep all actual information about its status here where impacted users can see it.

CraigMacomber added a commit that referenced this issue Mar 26, 2024
## Description

Expose an `@internal` api, `configuredSharedTree` which can be used to
opt into various internal/debug shared tree settings.

This should help cases like
#20274 root cause the
bug.
@CraigMacomber
Copy link
Contributor

I ran a pre-release build: 2.0.0-dev-rc.3.0.0.250606 (ex: https://www.npmjs.com/package/fluid-framework/v/2.0.0-dev-rc.3.0.0.250606) which includes #20332.

Once you use that (or a newer build) in addition to opting into additional validation to help track down the source of this bug, its possible to disable the data compression with treeEncodeType: TreeCompressionStrategy.Uncompressed which might help you avoid the bug until its fixed.

@amomo290
Copy link

Hello,

I tried to use the internal api for debugging but it seems that this is not accessible in the pre-release build 2.0.0-dev-rc.3.0.0.250606.

image

Maybe I missed something.

Then I caught the error 0x739 and here is the content of the stream that trigger an error:

{ "data": [ 5, "5703557480152899", 0, "Hello", "[email protected]", 1711531476552, 1711531476552, 0, "/project/A0000023BUTIDV01", 0, 0, 0, "", false, 0, "", "" ], "offset": 16 }

I also tried the fix for the "AzurUser" errors of the 2.0.0-rc.2.0.1 version. I don't have this error anymore, but it's replaced by another issue.
When we are using the audience.getMembers(), we have a member with no data:
image

It seems to occur instead of the "AzurUser" error.

@CraigMacomber
Copy link
Contributor

Hello,

I tried to use the internal api for debugging but it seems that this is not accessible in the pre-release build 2.0.0-dev-rc.3.0.0.250606.

I made a mistake. I did not realize the separate "internal" entry point is only enabled without our repo and not in public packages.
Since splitting the internal imports out into /internal is not enabled for the published package, they actually are available at the top level so this should work:

import {
	TreeCompressionStrategy,
	configuredSharedTree,
	typeboxValidator,
} from "@fluidframework/tree";
const SharedTree = configuredSharedTree({
	// Extra serialized data validation
	jsonValidator: typeboxValidator,
	// Disable tree compression
	// treeEncodeType: TreeCompressionStrategy.Uncompressed,
});

Thanks for the failing data example. I'll run that through the decoder under a debugger and see what I can figure out.

@CraigMacomber
Copy link
Contributor

@amomo290 Looking at the data you provided, I've realized I need a bit more information to get to the bottom of this:

  1. Do you know what version of the fluid packages were used to originally encode that data? If its not the version you are using to decode it, and its old enough it might be from before our last format change. Tree's data format isn't fully final yet (We don't expect any more breaking changes, but there might be one before the next RC release) and giving good errors on incompatible formats is something that's a work in progress.

  2. It looks like the data you posted is reasonable (not data corruption or anything obviously bad): I'll need a bit more context to know why it wouldn't parse. Higher up the call stack there should be a "decode" function taking in a "chunk" (the one defined at the top of src/feature-libraries/chunked-forest/codec/chunkDecoding.ts ). In there JSON.stringify(chunk) should capture the full context so I can actually try and decode it. Last time I forgot there was important header information needed for decompression that wasn't included in the stream I had you provided.

@amomo290
Copy link

amomo290 commented Apr 2, 2024

The version was the 2.0.0-rc.1.0.3. I tried with a new created container and I'm not able to reproduce the error 0x739.

Deleting all old containers references and using the v2.0.0-rc.2.0.1 version I don't have any '0x' errors but still have this issue:

I also tried the fix for the "AzurUser" errors of the 2.0.0-rc.2.0.1 version. I don't have this error anymore, but it's replaced by another issue.
When we are using the audience.getMembers(), we have a member with no data:

317225452-908d04c0-b318-4cba-ba93-2756a4f6eea3

It seems to occur instead of the "AzurUser" error.

@nmsimons
Copy link
Contributor

nmsimons commented Apr 2, 2024

@amomo290 the patch you are using includes a mitigation for the AzureUser error you encountered but doesn't fix the underlying issue which is, in fact, an issue in the service layer that results in missing user information. This issue only impacts the v2 Fluid Framework packages and we are working on a fix that should deploy shortly (by the end of this week - April 5, 2024)

@nmsimons
Copy link
Contributor

I believe this issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community-contribution
Projects
None yet
Development

No branches or pull requests

5 participants