Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default implementation of Stream.BeginRead/BeginWrite #33447

Open
filipnavara opened this issue Mar 10, 2020 · 13 comments
Open

Default implementation of Stream.BeginRead/BeginWrite #33447

filipnavara opened this issue Mar 10, 2020 · 13 comments
Milestone

Comments

@filipnavara
Copy link
Member

Currently the default implementation of Stream.BeginRead and BeginWrite is tailored towards seekable streams. It intentionally blocks parallel requests due to the concern that multiple parallel requests would modify the position in the stream. However, this behavior is problematic for non-seekable duplex streams such as network streams, compression streams, or other wrapper streams with some sort of encoding. These duplex streams should allow at least one of parallel read and write each to facilitate the documented pattern where BeginRead is always active to read incoming data.

Many of the streams implementations (GzipStream, HttpBaseStream, QuicStream, PipeReaderStream, PipeWriterStream, ReadOnlyMemoryStream, BrotliStream, SslStream; the list goes on) use the same TaskToApm pattern to overcome this limitation.

The problem with the above approach is the following:

I originally discovered this as breakage in Mono (mono/mono#18865) in their implementation of SslStream but it keeps popping up in other contexts and I think this should be addressed in a coordinated fashion.

/cc @baulig @steveisok @rolfbjarne @JeremyKuhne @carlossanlop

@filipnavara
Copy link
Member Author

filipnavara commented Mar 10, 2020

I realized I may not have been straightforward in what I am asking for. There are several possible courses of action that I can see:

  • Modify the default Stream.BeginRead/EndRead implementation to be suitable for duplex streams and remove the TaskToApm implementations. This may turn out to be compatibility problem but I believe there is a viable way to approach this. It would have the benefit of allowing the HasOverriddenBeginEnd[Read/Write] optimization working transparently.
  • Document the recommended way to implement the duplex streams, even if it means relying on TaskToApm. This would probably need to address the implementations that deviate slightly from the pattern (SslStream) and how to implement it in 3rd-party code where TaskToApm is not available.

@scalablecory
Copy link
Contributor

The concern seems valid from a quick peek at the code. It appears to function like this in Framework as well.

This may turn out to be compatibility problem but I believe there is a viable way to approach this.

It would be a breaking change to the current behavior. I'd be concerned that anything using APM these days is probably very old code in "don't fix what isn't broken" mode and we'd risk destabilizing them. Where has Mono seen the most benefit from making this change?

Document the recommended way to implement the duplex streams

I like this one better.

@filipnavara
Copy link
Member Author

filipnavara commented Mar 10, 2020

The concern seems valid from a quick peek at the code. It appears to function like this in Framework as well.

Correct. The duplex stream implementations in .NET Framework also override BeginRead/BeginWrite just like .NET Core does today.

I'd be concerned that anything using APM these days is probably very old code in "don't fix what isn't broken" mode and we'd risk destabilizing them. Where has Mono seen the most benefit from making this change?

Breakage is exactly what happened to us with the linked Mono change. We have some old code from 3rd-party library that depends on BeginRead/BeginWrite working in parallel on network, SSL and compression streams. It is a pattern that was encouraged by the documentation and it's likely there is code floating around that depends on this behaviour working.

I cannot speak for @baulig what was the motivation for the change. It was a code cleanup. Neither him, nor me, realized during the review that the default implementation in Stream changes the behaviour.

@stephentoub
Copy link
Member

I think there are multiple things being confused here. Stream.BeginRead/Write have two relevant behaviors that seem to be conflated here:

  1. The base Stream.BeginRead/Write will synchronously block if there's another async operation currently in flight.
  2. The base Stream.BeginRead/Write will queue a call to the Read/Write method.

(1) is behavior inherited from .NET Framework 1.x. I think for .NET 5 it'd be reasonable to change it to wait asynchronously rather than synchronously; there are code paths that already do that, just not BeginRead/Write, so the product change would likely end up being flipping a false to true in each method (though at that point we could probably delete some dead code as well). However, I'd be worried about going beyond that and removing the serialization entirely. It's not unlikely that code today is relying on that internal semaphore (knowingly or unknowingly) to ensure that operations don't overlap. That's true even if they're using ReadAsync/WriteAsync, for which the base implementations inherit the same serialization behavior (albeit asynchronously waiting rather than blocking).

Changing (2) would almost certainly break a ton of stuff, lead to stack overflows, infinite loops, etc. Any time a type adds virtual methods when it already has some, those new methods generally need to have base implementations provided in terms of the functionality already there. That establishes an ordering which derived types come to rely on. If we change that order here, and, say, a type was overriding ReadAsync/WriteAsync to just add some logging or something and then delegating to the base method, changing the base Begin/EndXx methods to delegate to the XxAsync methods would likely lead to an infinite loop / stackoverflow.

Modify the default Stream.BeginRead/EndRead implementation to be suitable for duplex streams and remove the TaskToApm implementations.

This is part of the conflation I'm talking about. Even if the base methods were changed to not serialize at all, we'd still want the derived overrides that utilize TaskToApm, because we want the derived Begin/End methods to be implemented in terms of the XxAsync methods.

Document the recommended way to implement the duplex streams

Override Read/WriteAsync, and stop using the legacy Begin/End pattern. If you're implementing a Stream in a shared library where you think someone might still be calling your Begin/End methods, then if you really care about perf for some reason when using Begin/End, or if you do have one of these duplex streams and need them to be able to run concurrently, reimplement Begin/End in terms of the XxAsync methods. Folks that need that can copy the TaskToApm internal helper into their own code (and it can be simplified even further if perf doesn't matter as much); I'd like not to expose yet more public types purely to service an obsolete model.

@filipnavara
Copy link
Member Author

I'd be worried about going beyond that and removing the serialization entirely

I'd be worried about that too. The question is whether there is a reasonably safe path to do a separate serialization for reads and writes for non-seekable streams. That would likely cover bulk of the cases where the current behavior is problematic.

If you're implementing a Stream in a shared library where you think someone might still be calling your Begin/End methods, then if you really care about perf for some reason when using Begin/End, or if you do have one of these duplex streams and need them to be able to run concurrently, reimplement Begin/End in terms of the XxAsync methods. Folks that need that can copy the TaskToApm internal helper into their own code (and it can be simplified even further if perf doesn't matter as much); I'd like not to expose yet more public types purely to service an obsolete model.

Shared libraries are my primary concern. I am not overly happy with code being copied over to different places but it's fine as long as it's an official guidance. I just need something to point people to. It is non-obvious that the default implementation can very easily lead to deadlocks. It hit us once due to a change in Mono that looked like a trivial code cleanup. Then the same thing almost happened again when Xamarin was being ported to run on .NET 5.

@stephentoub
Copy link
Member

The question is whether there is a reasonably safe path to do a separate serialization for reads and writes for non-seekable streams.

Just because a stream is non-seekable doesn't mean it's duplex. We also currently pay for a field on every stream to store the SemaphoreSlim, which is one field more than I'd like, plus actually creating that instance when it's used. I'd very much like to not add yet another one.

I just need something to point people to.

Sounds like something good to be added to documentation, and you can point people to that.

It is non-obvious that the default implementation can very easily lead to deadlocks.

Sure, but that non-obviousness wouldn't be addressed by exposing TaskToApm.

@filipnavara
Copy link
Member Author

filipnavara commented Mar 13, 2020

Just because a stream is non-seekable doesn't mean it's duplex. We also currently pay for a field on every stream to store the SemaphoreSlim, which is one field more than I'd like, plus actually creating that instance when it's used. I'd very much like to not add yet another one.

You are right. There are probably no proper criteria for detecting that a particular stream is a duplex stream. Seems like there is no easy implementation fix without changing the API shape.

Sure, but that non-obviousness wouldn't be addressed by exposing TaskToApm.

The issue is to document the non-obviousness and also offer guidance on how to correctly address it for library authors. The current preferred way to address it seems to be to use TaskToApm (judging by existing implementations in this repository). I am not saying that it has to be an exposed API, it could be part of the documentation even if I personally find that less than ideal.

@baulig
Copy link
Contributor

baulig commented Mar 14, 2020

There is an issue with using TaskToApm that - when used incorrectly - can lead to leaking unobserved tasks. This has caused major issues in Mono in the past, which were very hard to track down and eventually fixed, and it was the primary reason for that cleanup that removed these overloads.

And as I already pointed out in that other issue, the only guarantee that we've been making in Mono's SslStream implementation is that of it being a Stream. I really don't like the idea of having extra I/O logic in SslStream as in the past that has only lead to hard to detect issues.

@stephentoub
Copy link
Member

There is an issue with using TaskToApm that - when used incorrectly - can lead to leaking unobserved tasks.

What issue?

I really don't like the idea of having extra I/O logic in SslStream as in the past that has only lead to hard to detect issues.

I'm not following. What are you arguing for or against?

@baulig
Copy link
Contributor

baulig commented Mar 14, 2020

The TaskToApm being and end methods must always be called in pairs - which means you cannot move the begin call into some "implementation" class that gets disposed.

I'm arguing against making changes to Mono's SslStream unless absolutely necessary and very well tested. Ideally, I'd like to keep as much of the low-level I/O in the underlying stream instead of having custom overloads.

@stephentoub
Copy link
Member

The TaskToApm being and end methods must always be called in pairs - which means you cannot move the begin call into some "implementation" class that gets disposed.

This is how the APM pattern works in general: TaskToApm is just reflecting the realities of the pattern.

@carlossanlop carlossanlop removed the untriaged New issue has not been triaged by the area owner label Mar 16, 2020
filipnavara added a commit to filipnavara/mono that referenced this issue Apr 4, 2020
… and underlying implementation. Unlike the default implementation in Stream it allows parallel read along with parallel write using the standard TaskToApm pattern (dotnet/runtime#33447). This reverts part of mono#17393 which could lead to deadlocks as reported in mono#18865.
akoeplinger pushed a commit to mono/mono that referenced this issue Apr 8, 2020
…#19442)

* Add BeginRead/BeginWrite/EndRead/EndWrite overloads back to SslStream and underlying implementation. Unlike the default implementation in Stream it allows parallel read along with parallel write using the standard TaskToApm pattern (dotnet/runtime#33447). This reverts part of #17393 which could lead to deadlocks as reported in #18865.

* Move the TaskToApm pattern from MobileAuthenticatedStream to SslStream

* Bump API snapshot submodule

Co-authored-by: monojenkins <[email protected]>
monojenkins pushed a commit to monojenkins/mono that referenced this issue Apr 8, 2020
… and underlying implementation. Unlike the default implementation in Stream it allows parallel read along with parallel write using the standard TaskToApm pattern (dotnet/runtime#33447). This reverts part of mono#17393 which could lead to deadlocks as reported in mono#18865.
akoeplinger pushed a commit to mono/mono that referenced this issue Apr 8, 2020
… SslStream (#19476)

* Add BeginRead/BeginWrite/EndRead/EndWrite overloads back to SslStream and underlying implementation. Unlike the default implementation in Stream it allows parallel read along with parallel write using the standard TaskToApm pattern (dotnet/runtime#33447). This reverts part of #17393 which could lead to deadlocks as reported in #18865.

* Move the TaskToApm pattern from MobileAuthenticatedStream to SslStream

* Bump API snapshot submodule

Co-authored-by: Filip Navara <[email protected]>
@carlossanlop carlossanlop added this to the Future milestone Jun 29, 2020
@thenik
Copy link

thenik commented Nov 25, 2022

Hello,
I found a different behavior for HttpBaseStream BeginRead on ,NET Core vs Framework
Find details here microsoft/referencesource#177 maybe you know how to adjust it.

@stephentoub
Copy link
Member

Hello, I found a different behavior for HttpBaseStream BeginRead on ,NET Core vs Framework Find details here microsoft/referencesource#177 maybe you know how to adjust it.

That is unrelated to this issue.

@jozkee jozkee removed their assignment Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants