Requests against paused container may result in infinite wait if component requests exists in future (ops only), not snapshot #3085

vladsud · 2020-08-06T21:23:15Z

Quick recap:
In the scenario from the linked thread, the client was attempting to request a component which was not yet attached (in terms of sequence numbers), and while the container was paused. This results in a deadlock, because the attach op for that component was shortly after the summary.

The client knew how to request the component before it was attached, because it was copy-pasted. So the second client knew the request URL from external communication terms.

arinwt · 2020-08-06T21:36:27Z

Should there be a synchronous way to check existence of component via request semantics? I don't know that allowing pause is fundamentally bad, it just requires caller to understand potential deadlock scenarios and be extra careful.

If they could check synchronously, it would provide a way to do the normal flow:

Check synchronously if it exists (given current seq number)
If yes, go through normal flow, i.e. await with timeout for perf optimization
If no, resume first to prevent guaranteed timeout

If we really wanted to go further with it, we could provide an API to listen for specific components being attached, etc. Not sure if that is the right path though.

vladsud · 2020-08-06T21:42:17Z

@arinwt, do not pass request.headers.wait === true? Is that what you are looking for?
Fundamentally missing part can be anything, we should not focus on data store.

arinwt · 2020-08-06T21:56:18Z

Oh I'm thinking about it wrong. It feels like the flow needs to be reworked a little when the client requests a specific component through the loader. If they do not already have the container loaded vs. if they do. Sort of seems like a two-phase thing.

Although the end-code maybe shouldn't look like this, this is my first understanding of maximum functionality coverage with single request semantics. i.e. we probably don't even need callback 1.

When directly requesting a component:

callback 1: (containerAlreadyLoaded: boolean) => boolean // return should wait for container to load or not?
callback 2: (componentAlreadyAttached: boolean) => boolean // return should wait for component to load or not?

^ above is a pretty complicated API... but I'm just putting out thoughts.

Alternatively, we can try do decide for them, take more control over the flow. In this case, pause is more of a preference than a hard rule (could also add a third option: pause: preferPause goes this flow, definitelyPause can reject/return without timeout, noPause already covered).
When requesting a component with pause set to true:

load container as it currently works (paused)
synchronously check if requested component is attached already
if attached, return the loaded component async, if not attached resume and watch for the requested component until some timeout (OR current sequence number exceeds checkpointSequenceNumber?).

curtisman · 2020-08-08T01:17:35Z

#2859 is related too.

vladsud · 2020-10-13T06:16:06Z

For hosts using paused container loading flow (main flow), I'd love to experiment with following workflow:

Host boot container and makes a request
If request fails, host waits for container to connect, get up to date and repeats request.

This in my view is much better than what we have today, as it avoids infinite waits. PR #3830 creates a building block for that flow to be experimented with.

For hosts that do not use paused container loading flow, or for those that want # 2 above to resolve faster, we can improve the flow by waiting (racing) for either # 2 or channel (data store) being attached. I personally feel this we should not go that route - that's even more complexity (for what should be rather rare event), but also it's not very clear how it will compose with other features, like data stored being on channels, i.e. nested data stores. I'd rather see simple system, and improve only as we get data that we need to improve it further.

vladsud · 2021-02-26T23:50:04Z

Consolidating tracking of the work related to waits in #4508. Closing this issue.

vladsud added the bug Something isn't working label Aug 6, 2020

vladsud added this to the August 2020 milestone Aug 6, 2020

vladsud self-assigned this Aug 6, 2020

ghost added the triage label Aug 6, 2020

curtisman removed the triage label Aug 8, 2020

vladsud modified the milestones: August 2020, September 2020, October 2020 Aug 31, 2020

skylerjokiel mentioned this issue Oct 8, 2020

Remove waits from getRootDataStore / ContainerRuntime.request() APIs #3875

Closed

vladsud modified the milestones: October 2020, November 2020 Oct 13, 2020

curtisman added api area: runtime Runtime related issues and removed bug Something isn't working labels Oct 26, 2020

vladsud modified the milestones: November 2020, December 2020 Nov 30, 2020

vladsud added the focus Items that engineers are focusing on now, but may not have any (coding) outcome in current milestone label Dec 15, 2020

vladsud modified the milestones: December 2020, January 2021 Jan 6, 2021

danielroney modified the milestones: January 2021, February 2021 Jan 8, 2021

vladsud modified the milestones: February 2021, April 2021 Feb 26, 2021

vladsud mentioned this issue Feb 26, 2021

Infinite waits while resolving data stores / requests #4508

Open

vladsud closed this as completed Feb 26, 2021

danielroney removed this from the April 2021 milestone Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requests against paused container may result in infinite wait if component requests exists in future (ops only), not snapshot #3085

Requests against paused container may result in infinite wait if component requests exists in future (ops only), not snapshot #3085

vladsud commented Aug 6, 2020 •

edited by arinwt

Loading

arinwt commented Aug 6, 2020 •

edited

Loading

vladsud commented Aug 6, 2020

arinwt commented Aug 6, 2020 •

edited

Loading

curtisman commented Aug 8, 2020

vladsud commented Oct 13, 2020

vladsud commented Feb 26, 2021

Requests against paused container may result in infinite wait if component requests exists in future (ops only), not snapshot #3085

Requests against paused container may result in infinite wait if component requests exists in future (ops only), not snapshot #3085

Comments

vladsud commented Aug 6, 2020 • edited by arinwt Loading

arinwt commented Aug 6, 2020 • edited Loading

vladsud commented Aug 6, 2020

arinwt commented Aug 6, 2020 • edited Loading

curtisman commented Aug 8, 2020

vladsud commented Oct 13, 2020

vladsud commented Feb 26, 2021

vladsud commented Aug 6, 2020 •

edited by arinwt

Loading

arinwt commented Aug 6, 2020 •

edited

Loading

arinwt commented Aug 6, 2020 •

edited

Loading